THE ANNALS 
of | 
MATHEMATICAL 
STATISTICS 


(FOUNDED BY H. C. CARVER) 


Tue OFFICIAL JOURNAL OF THE INSTITUTE 
OF MATHEMATICAL STaTIsTICs 


Contents 


PAGE 


Further Contributions to the Problem of Serial Correlation. 
Witrrip J. Drxon 


On a Statistical Poblem Arising in the Classification of an Individ- 

ual into One of Two Groups. ABRAHAM WALD 
_ Asymptotic Distribution of Runs Up and Down. J. Wo.Fowirz.. 163 
> Statistical Analysis of Certain Types of Random. Functions. H. 


pwns AND DA. RAB. oo 35s. FO EARS ae ee ede 173 
Random Alms. Pavut R. Haumos 
On Biases in Estimation Due to the Use of Preliminary Tests of 
Significance. T. A. BANcRoFr 


The Probability of Convergence of an Iterative Process of Inverting 
a Matrix. JosepH ULLMAN 


Notes: 


On Distribution-free Tolerance Limits in Random Sampling. HEr- 
BERT ROBBINS 


A Formula for a Sizes for Population Tolerance Limits. H. 
Scuerre and J. W. Tuxry 


A Generalization of Waring’s Formula. 
Note on the Variance and Best Estimates. H. G. Lanpau 


News and Notices 
Report on the Washington Meeting of the Institute 


A I 


Vol. XV, No. 2 — June, 1944 





THE ANNALS 
OF MATHEMATICAL STATISTICS 


EDITED BY 
8. 8. WILKS, Editor 
A. T. CRAIG H. HOTELLING 
W. E. DEMING J. NEYMAN 
T. C. FRY W. A. SHEWHART 


WITH THE COOPERATION OF 


©. E1tsenNHqART A. M. Moop 
W. K. Fe.ier H. Scurerrt 
P. G. Hon. A. Wap 

W. G. Mavow J. WoLFow!Tz 


The Annats oF Maruematicat Statistics is published quarterly by th 
Institute of Mathematical Statistics, Mt. Royal & Guilford Aves., Baltimore 2 
Md. Subscriptions, renewals, orders for back numbers and other budlions com a 
munications should be sent to the ANNALS OF MATHEMATICAL STATISTICS, t. 
Royal & Guilford Aves., Baltimore 2, Md., or to the Secretary of the Insti- 
tute of Mathematical Statistics, P. 8. Dwyer, 116 Rackham Hall, Universita 
Michigan, Ann Arbor, Mich. 

Changes in mailing address which are to become effective for a give 
issue should be reported to the Secretary on or before the 15th of . 
month preceding the month of that issue. The months of issue are Ma 
June, September and December. Because of war-time difficulties of publi 
tion, issues may often be from two to four weeks late in appearin 
Subscribers are therefore requested to wait at least 30 days after month of issu 
before making inquiries concerning non-delivery. 


Manuscripts for publication in the ANNALS oF MatHematicaL STarisTiog 
should be sent to 8. 8. Wilks, Fine Hall, Princeton, New Jersey. Manuscriptey 
should be typewritten double-spaced with wide margins, and the original copy 
should be submitted. Footnotes should be reduced to a minimum and whenever 
possible replaced by a bibliography at the end of the paper; formulae in foe - 
notes should be avoided. Figures, charts, and diagrams should be drawn ¢ i 
plain white paper or tracing cloth in black India ink twice the size they are t 
be printed. Authors are requested to keep in mind typographical difficulties , 
of complicated mathematical formulae. x 


Authors will ordinarily receive only galley proofs. Fifty reprints without, 
covers will be furnished free. Additional reprints and covers furnished at costs) 


The subscription price for the ANNALS is $5.00 per year. Single copies $1.5 ; 
Back numbers are available at $5.00 per volume, or $1.50 per single issue. @ 


CoMPOSED AND PRINTED AT THE 
WAVERLY PRESS, Inc. 
Bauttmore, Mp., U. 8. A. 


Entered as second-class matter at the Post Office at Baltimore, Maryland, under the Act of March 3, 1879 j 





ae 


4 





Pate 











FURTHER CONTRIBUTIONS TO THE PROBLEM OF SERIAL 
CORRELATION 


By Witrrip J. Drxon 


Princeton University 


1. Introduction. Recently, there has been an increasing interest in the study 
of the serial correlation of observations. The development of the distribution 
theory and significance criteria was retarded by the fact that the successive dif- 
ferences or successive products of statistical variates are not independent. How- 
ever, these difficulties have been overcome to a considerable extent by recent work 
of several authors. In order to indicate the nature of the contributions embodied 
in the present paper, it will be necessary to describe rather precisely the contri- 
butions of these authors. 

Suppose 21, %2, --: , % are n independent observations of a random variable 
z which is normally distributed with mean a and variance o. Let us define 


n—1 


n 
Sn~1 , (tin. — ai)” On y 2 (Ginn — ai)” 


a i=1 


nr— n 


= b » (x; -_ E) (Xe41 ad f) c. i (x; ae E) (is — £) 


t=1 t=] 


Cn x (x; ao E) (Vip - £) V, > (2; _ zy 
t= i=1 


in which x,4; = x;. The ratio of any of the first five values to V, will be a 
measure of the relation between the successive observations 7; . 

r ~ . ‘ 9 , " 

Von Neumann [2] has studied the ratio 7 = 6,1/V,. He obtains an expres- 


sion for the sampling distribution of the ratio 7». He solves the equivalent prob- 
n—l 


lem of determining the distribution of >> A. where the point (y1, y2, °° 5 Yn—1) 
t=] 
n—1 


is uniformly distributed over the spherical surface >> y; = 1 and the A; are the 
i=] 


characteristic values of 6_,. He obtains the distribution w(y) of y = z Bix; 
i=] 


(m even) where the point (a1, %2,-°-+,2%n) is uniformly distributed over the 


m 


spherical surface :» xi = landB, > B,>--- > Bn. w(7y) is found by solving 


i=l ‘ 


the equation 


(1.2) [a - ae) ay = Te - 2. 


Bm i=1 
The distribution of 7 is then a special case of this distribution. The first four 
moments were obtained by Williams [5] by the use of a generating function. In 
119 











120 WILFRID J. DIXON 


the present paper we shall study the ratio 6,/V,. The moments of this ratio 
will be developed and the moments and approximate distribution of [2 — 5°,/V,}. 

Von Neumann [4] in a paper which removed a restriction (that m be even) 
on the distribution of 7 indicates how to determine the distribution of C,_,/V,, . 
Koopmans [9] considers the stochastical process 7; = px;1 + 2 (t = 1, 2, ---), 
|p| <1. The z are independent drawings from a normal distribution with 
zero mean and variance ¢. To test the hypothesis that p = 0 he shows that it 
is sufficient to know the distribution of C,1/V,. He finds the distribution of 
Cra/V, and C,/V, but finds that the numerical computation of these functions 
is very cumbersome. This prompts him to obtain approximate formulas for 


these distributions. The approximate formula for the distribution of # = 
C./V. is 


are cos 7 
(1.3) (in — 1)2"x7? [ (cos a — 7)" sin 4na sin a da. 
0 


A similar approximation will be used in this paper to find the moments of 
C,/V,. It will be shown how good the approximation is and how by using this 
approximation we may obtain a tabled function (Pearson Type I) which fits the 
distribution of 1 — (C,/V,)° up to 4n moments. 

The quantity 1 — (C,,/V,)°, we shall find, is equivalent to a likelihood ratio 
function for testing the hypothesis that the serial correlation is zero. 

Anderson [8] obtained the distribution of .C,/V, = .R,. He proved that the 
distribution of ,F, is the same as that of 1R, when L and n are prime to each 
other. He has computed the 1 per cent and 5 per cent significance values 
(L = 1) upton = 75. For values of n > 75 he indicates that a normal distribu- 
tion which is an asymptotic approximation may be used. He has also computed 
some significance values for the cases of N/L = 2, 3, 4. 

In this paper we shall develop the moments of :R, . 

The use of the ratio 7 in the study of serial. effects in ballistics at Aberdeen 
Proving Ground is given in references [1] and [2]. The use of the ratios C,1/V, 
and C,,/V, in the study of economic time series is discussed by Koopmans [9]. 


2. Likelihood criteria. Given a sample of mn observations 11, %,°+:*,2n 
we shall assume that they are distributed according to the law: 


1 n -— > (ta—a—braq— 1)? 
(2.1) dP, = (F J " eaiiaat dv, +--+ dx,, i <is a). 
TO 


It will be convenient to use the phraseology that ‘“‘the variate at the time a has 
as its mean value a linear function of the variate at the time a — 1.” We shall 
take x; = 2,4;:. Due to the symmetry we may use a + / in place of a — l. 
This will be done to obtain agreement with previous work. We wish to test 
the hypothesis H, that each variate is independent of the other variates, that is, 
that b = 0. The Nevman-Pearson specification of H; may be written as follows, 





Mv we “ve 


OQ - VW 





SERIAL CORRELATION 121 


where @ is the space of admissible values of o”, a and b, and w the subspace de- 
fining H; : 

|Q:¢ > 0 “—e < 6, b< x 
(2.2) ‘ 
ac >U —*e <a< x, b= 0. 
The likelihood criterion \; suitable to this hypothesis is the ratio of the maximum 
(w (max.)) of (2.1) with the restriction that b = 0 to the maximum ( (max.)) 
of (2.1) without this condition. Now, 


dP,,(w max) 

he oe “Seeeeeenet , 
(23) ' ~ dP,(Q max) 
We see that the likelihood function is 
1 


20° 





(2.4) L = —n log (V/2r0) — D> (ta — @ — baa)” 
a=l1 


and to maximize L over the space 2 we compute the following derivatives 


oL - 
a => e Z(%e = bta+i); 
. oL a 
(2.5) ab = a time —- a= bta+1)Xa+t; 
oL n 1 
de => 52 — a Z(t =o = bXe+1). 


The solutions of the equations obtained by setting the above derivatives equal 
to zero are: 


a2 1 * z 2 
e = a Zz. = eS bta+1)” 
, — , 
(2.6) @ = - Saq(1 — 6) 
_ NZLaLart — ([2a)" 


9 9 
NZUXe_ — (Za) 


If we now maximize L over the space w we obtain 


—_— a 
n 


(2.7) 

Pr 

Gd = — DX 

n 
so that we will have 
(2.8) dP,(2 max) = [2e(1 — #)E(@. — BT", 
(2.9) dP,,(w max) = [2n2(t. — #7 "e™”, 
(2.10) 1 = (1 a by, 





122 WILFRID J. DIXON 








where b is defined as in (2.6) above. If we set a = 0 in (2.1) we may follow a 
similar procedure and find the criterion oA. = (1 — 63)*" for testing the hypothesis 
that b = 0 if it is known that the population mean equal zero. Here 














; mo, x 
(2.11) » =. 
: 
aka 










We notice that 6 is the criterion chosen by R. L. Anderson as a measure of 
serial correlation. He has obtained the distribution of 6 and has computed a 
number of significance values from this distribution. The distribution is a 
function of n and /, and Anderson points out that the larger the values of n the 
smaller the extent to which the significance values depend on 1. 

In the next section we shall find a distribution which approximates very 
closely the exact distribution of 6 and which is independent of the lag 1. 















3. Moments of the likelihood criteria. We shall determine the moments of 
bo and oA; when the hypothesis oH; is true and the moments of b and \, when the 
hypothesis H; is true. Let us first consider bo = Sxete+1/Dxz , the criterion we 
obtained for testing the hypothesis »//,. The moment generating function for 
the joint distribution of Co = Zaxea+1/0 and Vo = Lax2/o° is 


oli ’ to) Elexp (Cot, + Vote)| 


(3.1) is zs) [- [exp (¢ te Vols * om , 202 II dita. 


a==1 


I 


By reference to this last expression it can be seen that 


(3.2) E(bo) = — dts , 
20 Ot; t,=0 


and further 
0 


(3.3) E(bi) = / . 


—oO 


k k 
| a elt, “| II dle; , 


ati (a0 jal 


k 
in which we set && = 2. to; . 


j= 
Now if we write (3.1) as follows: 


(3.4) (55 y [- fe “aga 3 Asie dx; , 
Qro 


we see that g(t, &) = | Ai; | and Ai; = Aisa, j+a ; that is, this determinant is 
a circulant. Let us write A;; = a, where h equalj —i+ lorj—zi+i+n 
taking that subscript which gives 1 < h < n, so that we have 






a & az | 
An ay a2 aN QAn-1 | 
. . . . | 








(3.5) ¢ (h,t) = | 
| 


| Qg ag ds 








SERIAL CORRELATION 


which expanded by the method of circulants becomes 
n n 
TT Dd awe” 
Aywk ; 
k=1 i=1 
where the w, are the nth roots of unity. 
First we shall consider bo (lag 1). Here a, = 1 — 2t., a2 = dn = —t and ag 
tO @,-1 equal zero. 


oyi (th, &) =I i+ — tie +a) 


») le 
(1 — 2% — 2; cos at). 
n 


For lag 1, a, = 1 — 2t2, Qi4. = Gn—1y41 = —t and the remaining a’s = 0. 


(3.6) 


oi (ti, 2) = TT (1 — 2 — (wi + oe") 


n 9 if 
- [J (1 — 2t. — 2t, cos af) ‘ 
n 


We shall develop an approximation to these functions (3.6) and (3.7) as follows: 


(3.7) 


“ - e > log (A+B cos a;) 
(3.8) ogilhr , ts) = I (A + B cos ar) 7 =e “k=l : 
kh 


b= 1 


where in this case A = 1 — 22, B = —2t, and a = 2alk/n. Let us now alter 
the form of this exponent and replace the sum of a finite number of terms in- 
volving a, by an integral of a continuous variable a. 


(3.9) ogi(li, fe) = exw (F = log (A + B cos w)), 
TT 
Let us write 22l/n = Aa, , then we shall have 
(3.10) ovi(t;, tf) = exp — z: log (A + B eos wu) San) 
4rl k=1 
If we take B < A we see that A + B cos a, is never negative; therefore as 
we let n increase the summation will approach the value of the integral: 
rl 
[ log (A + B cos a) da. Let us then replace this summation by its limiting 
0 


integral. The resulting function will then be an approximation for n large 
enough. We shall obtain then 


, 2m 
(3.11) wh.’ ~ exp ( n [ log (A + B cos a) da) B <A, 
A4nl Jo 


- . > 1 e 
from which by the use of the integral’ we obtain 


(3.12) ogi(ti, 2) ~ exp (—43n log (A + YA? — B?)). 


1 This integral may be verified by differentiation with respect to the parametera. It 
may be found as formula 523 in Pierce’s ‘‘Short Table of Integrals.’’ 





124 WILFRID J. DIXON 


So that 
(3.13) wilh, &) ~ (A + Va? — BYP. 


We now have ¢(i:, fs) represented approximately by a power of a single quantity, 
The question of how good this approximation is will be discussed in a later 
paragraph. ‘This is similar to the approximation used by T. Koopmans in the 
distribution of b) . He makes the substitution “to obtain what intuitively seems 
, 


7 
to be in some sense the closest approximation.’’” He approximates li «- Kr) 
t=] 


where x ant 
Tre Kk: = COS —— 
T 


distribution given in (1.3). 
To obtain the corresponding function for 


by the process followed in (3.8)—(3.13) in order to find the 


(3.14) i, = See Tas — (Zaa)'/n 
YY (Sta)'/n 
we follow the same procedure as above for 6). Here 
C = [Sreter1 — (=Xa)'/n}/o’, 
Y= [Sa aa Yaa) /n\/o’, 
and in (3.5) a, = 1 — 2t2 + 2( + ts) /n3 ae = an = —t + 2(t + t2)/n; and all 


the other a’s = 2(t; + t)/n so that the expansion of this circulant becomes 


gi (th ,&) = I] yu — 24+ 2(4 + to) /n] + |—t + 2(t; + to) ‘n] (wr + wr) 


(3.15) 


+ [2h + &)/n] D ot, 


and since 
n—l F ' —1 
: (<3 — (a: + wo + 1) k #4, 
>» Wk 9 . 
=3 n—3 k=, 
we get 


n—1l 


(3.16) i} {1 — 2 — tier + )}, 


k=1 
and for lag / we get 


n—1 


- (3.17) I] {1 — 2% — th(wk + wx')}. 


k=1 


These two results are the same as those obtained previously except that the 
final term, 1 — 24; — 2% = A + B, of the products is missing. We will then 
obtain as an approximation to these finite products ¢1° = o¢1 /(A + B) or 


(3.18) a(t, t) = (4 + V4? — BY''VYA+ B 


where A = 1 — 24and B = —2t,. 





SERIAL CORRELATION 125 


A method of finding the moments of 6) and 6 was outlined in (3.1), (3.2) and 
(3.3) above. If we perform these operations on o¢i(ti , é) as defined in (3.13) 
we find 


ut” 


—}n-—1 ou 


1 
—snu 
" Ot 


ou\’ ou 

—jJ+u—, | 

at) Oly 
au! Ou 


ara = 1 a 2. —— == QO, 9 = eC... & A 5 Cri 
where 2 |o 2) Ob |g at 1 — %’ etc., and the zero subscript 


indicates that ¢; has been set equal to zero after differentiation. If these values 
are substituted and the required number of integrations with respect to 2 are 
performed, we find the moments of the criterion by) when oH; is true. 


M, =0 Me 


Ms M, = (n + 2)(n + 4) 
15 a 
(n + 2)(n + 4)(n + 6) 


etc., Or 


Ms 


Moy 


1-3-5 --+ (2k — 1) 


Meo, = (n + 2)(n + 4) --- (n + 2k)’ 


Mz, may be verified by the use of an expansion of the generating function (3.13) 
by a method of Laplace [10]. He gives the expansion of uw where u is given by 
the equation uw — 2u + & = 0 as follows: 


7“ ie” ‘i ii + 3)e 


+3)e , tt + 4G + de 
is" 13446 


(1.2.3. Qi+6 
ii tk+1)--- G+ 2k —1e& 
“(k—1! «Qtek : 


oo 


henna dp ane 
We see that u = 14+ +/1 — @, and if we set e = 4,/(3 — &) andi = 34n. We 
obtain 99, = w*” as a series in the even powers of t;. From this we can see that 
the odd moments are zero and from the form of the coefficients we can verify 
Mx . 

These moments are not contained in the works of the other authors writing 
on this subject. Although these moments are obtained from an approximate 
generating function they are, as will be shown later, the exact, not approximate, 











126 WILFRID J. DIXON 


moments for k < n, for lag 1 and are the exact moments for k < n/a for any lag 
l where a is the largest common factor of n and the lag /. 

To obtain the moments of 6 we follow a similar procedure with 
A= 1 = & — @&). Differentiating ¢; the required number of times with 
respect to ¢; and integrating an equal number of times with respect, to tf gives the 
following moments: 


n— | n+ | 
M; = = M, = : 
ee (nm — 1) + 3) ~* (n+ 1)(n + 3) 
(3.21) 
ul _ —1.3-.5.-. (2k — 1) 
nt a — le + 3)\(n+ 5) -++ (n + 2k — 1) 
i ae co — 
My. 3-5 (2k 1) 


ee (na + 1)(n + 3) --- (n + 2k — 1) 


Examination of the moments of bo will show that bo = x is distributed accord- 
ing to the law 


(3.22) Ka - 7)" = Kya + 2)" -— 2) 


up to n moments. This distribution is symmetric and we may wish a normal 
approximation to this curve. The mean is zero and the variance is 1/(n + 2). 
r e ° 2/7 $9 . e ° e 

The X criterion oAj/" = (1 — 65) = y is distributed according to the law 


(3.23) K2(1 ees yy yh 


-nt+il / : 2(n + 1) 
up to $n moments. Here the mean is — and the variance is , 3 : 
n+2 (n + 2)"(m + 4) 


If we inspect the moments of b we see that the distribution of \j/" = (1 — 6’) =z 
follows the law 


(3.24) csi - 2) tga) 


up to 3n moments, which is the same as the distribution immediately preceding 
except that nis replaced by n — 1. The distributions (3.22), (3.23), and (3.24) 
are the same for lag / except that’ the fit is up to n/a, n/2a, and n/2a@ moments 
respectively where a is the largest common factor of land n. These restrictions 
are necessary since the moments as given in (3.20) and (3.21) are obtained from 
the approximate generating functions (3.13) and (3.18). The exact generating 
function is given in (4.8) for lag 1 and it is found that the nth or higher deriva- 
tives bring contributions from the part of the generating function which was 
neglected in approximating the generating function. The additional restriction 
for lag 1 ~ 1 will be seen in the last two paragraphs of section 4. The extra 


SERIAL CORRELATION 127 


factor 5 in the second and third case above is due to the fact that only the even 
moments of (3.20) and (3.21) are used. 

We have in (3.23) and (3.24), then, very close approximations to the dis- 
tributions for the two \ criteria for testing serial effects. 

The following table is a comparison of the exact and approximate 5 per cent 
and 1 per cent points for the distribution of 6. The exact values are taken from 
the table given by R. L. Anderson. The normal approximation as given by 
Anderson in his table does not show such close agreement since he used an asymp- 
totic second moment. He indicated that the exact values would have to be 
used for values of n < 75 in place of the values from the normal approximation 
which he obtained. Here we see that the normal approximation may be used 
for n somewhat less than 75. The Pearson Type I approximation was obtained 
by using the first two moments of 6. The curve obtained is: 


mn\ Pa! — wae 
(3.25) See 
Bp, Q2Pt4 
' — 1)\(n — 2 7 
in which p = = ian 3) = and q = pe _ “ 


The exact values marked with an asterisk in the table differ slightly from 
those previously published. They are more precise values from the exact dis- 
tribution whieh R. L. Anderson has made available to the author. 


Positive tail 


5% 1% 

N Exact Type I Normal Exact Type I Normal 

5 . 200 OT .281 297 0270 .001 

10 360 362 300 ao 2933 41 

3) 228 324) .323 ATS ATT 486 
20) 299 ~ 299 296 .432 .433 440 
25 276 276 .204 398 .398 404 
30) 257 207 255 0 371 O10 

15 .218 .218 mae 013" 2315 316 
79 .1c4* .174 174 250 - 250 .251 

Negative tail 
5% 1% 

N Exact Type I Normal Exact Type I Normal 

5 195 .¢42 181 .798 .858 1.000 
10 .564 962 902 100 . 402 . 163 

15 .462 461 466 oF 096 . 629 
20) 399 399 401 .o24 .924 .545 
25 dob .3d0 Oot 473 473 A87 
30 .324* .324 324 433 .433 .444 
3) . 262 . 262 . 262 306 356 . 362 


is .201* 201 201 276 .276 208 





128 WILFRID J. DIXON 


4. Alternative expansions of the generating functions. In this section we 
shall determine the exact generating functions which were approximated in (3.12) 
and (3.16) and obtain these same approximations in another manner. This 
development will enable us to see how good the approximation is in the sense 
that it gives a certain number of exact moments. The determinant in (3.4) for 
lag / and mean zero can be written 


ab b | 
iba b 
| bab Oo | 
ba ‘| 
b ala 
where a = 1 — 2¢., b = —t, and all the other elements are zero. The b in the 
upper right corner and lower left corner indicate the value b in the ai, and a, 
positions. Let us define the following determinants: 
a b | lbab 
bab | | bab 
bab 0 bab 0 


0 


| 
| 
bab 
b ala 


We see that 
A, = B, + (-—1)" Cn, 
(4.3) B. = Ds + (—1)""", 
C8 + 1)", 


and A,, can be expressed in terms of D,, by substituting for B, and C,, in the equa- 
tion for A,. 
(4.4) A, = Dn, — UDp» + 2(-1)"'b". 

We can obtain an expression for D,, if we expand this determinant by the first 
row. This gives ' 
(4.5) D, = aDpa — Dy». 





SERIAL CORRELATION 129 


Since this is a second order difference equation, the solution may be written 
D, = kyu" + kw" where u, v are roots of the equation a — ax + b = 0. Now, 


Di =utv = kw + kw, 


(4.6) 2 2 2 : 
Dp = w+ +w = hw + kw 


so that we can determine ki and hk». We now see that 


n+l n+1 


(4.7) i dik soca 


= 7 


which upon substitution in the equation for A, gives 


A, =u" +" + 2(—-1)""d", 
(4.8) n n n 
=u tv — 2, 


where u,v = 3(1 — 2%) + V1 — 242)" — 41. Nowoei(ti, &) = Az} and it is 
easily seen from the form of A,, directly above that derivatives up to the nth 
order with respect to 4; in which 4 is then set equal to zero will be given by de- 
rivatives of A, = wu” and this is the approximation (3.13) found by other methods. 

The determinant in (3.4) for lag 1 and mean not equal to zero can be written 


ab b a=j]- 2te + 2(t + to)/n, 
bab 


bab —t: + 2(4, + t2)/n, 


Q(t, + t2)/n. 
bab 
b ba 
Let us define the following determinants: 


bab 
bab 


bab 





130 WILFRID J. DIXON 


If we replace the b in the upper right corner of A, by ¢ + (b — ec) we obtain 
(4.11) A, = Cy, + (-1)""( — OB. 

If we replace the b in the lower left corner of B, and C,, by ¢ + (b — c) we obtain 
Bn = Da + (-1)""'(b — OFA, 

Cr = Ex + (—1)""'(6 — Dra. 

We now have A, in terms of D, and E,,. We must now evaluate D, and E,,, 


Rahs = + l e.g + « | 
Obabdb —crsr 
0 bab 


(4.12) 


(4.13) D, = 


ba 
0 b n+1 = LT \n+1 


where r = b — cand s = a — ¢ and the second determinant above is obtained 
from the first by subtracting ¢ times the first row from all other rows. Writing 
this last determinant as the sum of two determinants by separating the first 
column we get 


(4.14) D,. =r — CP n41 ° 
Combining the above difference equations we obtain 
(4.15) A, = E, — rEg + 2(—-1)"reF,, + 2(-1)""'r" 


and see that we must obtain F,, and F,, . 
Expansion of F,,,; by the second column gives 


(4.16) Fas = —Ga + rF 

and expanding G, by the last row we get 

(4.17) Gn = rGa-a + (—1)""'Ha-i . 
| 


(4.18) 





SERIAL CORRELATION 131 


n+l n+l 


; , u 
H,, is the same type as (4.1), therefore H, = where u and v are the 


u—v 
. 2 2 “ 
roots of the equation x — sr + r = 0, so that (4.17) becomes 


’ : \n-1 uo 
(4.19) G, — rGn+ = (—1)"" 
u—v 
and the solution of this equation gives 


Y F (—1)""[r(u" — v") + ul" 
& 0 G,, — ; - 
(4.20) 2r +s (u — v)(2r + s) 


Introducing this expression into (4.16) we find 


n n nl 
a) F,, = —] si : mee = ne 
a ie (u—v)(2r+s) 2rt+s 


i ae 1 a 4 
Oab —esr 
Oba 


b —erer 0 


bab 
0 b a n+1 —_ 


where the second determinant is obtained from the first by subtracting c times 
the first row from all other rows. 


If we separate this last determinant on the 
first column we get 


(4.23) E, = H,, ie ' CI n41 


Expanding J, by the last row, we get 

(4.25) I, = (—1)"'G,-1 — tJann + Sie. 
Expanding J, by the last column, we get 

(4.26) Jn = (~1)" "Gna + rly. 

If we combine these last two equations we find 


(4.27) In — sla + rIn-2 = (—1)""(Ga-a + 7Gn-2). 





132 WILFRID J. DIXON 


If we now solve this difference equation for J, , substitute this solution in the 
equation for /, , and in turn substitute this result and the expression we ob- 
tained for F, in (4.15), we get 

1, — Ub EAP 

ies 2r+s 
(4.28) 


a = 


The final form results since 2r + s = 1 — 24, — 24,7 = —t,. wandv have the 
same values as before. If we compare this result with that obtained in (4.8) for 
mean equal to zero we see that this is the same except that here we have the 
added factor, 1 — 2t, — 2t.,in the denominator. We have a similar result then 
for the approximation for derivatives of gi(ti, t2) = Aj for t; = 0. Here this 
approximation is A, = u"/(1 — 2t, — 2t), the same result as that obtained in 
(3.18). This approximation will yield moments which are exact for n > ak for 
any lag / where a@ is the largest common factor of n and J. The reason for this 
restriction is easily seen if we consider the expansion obtained in (3.7), for 


oo 2rlk 
(4 29) dg(ti ’ to 2) _ ; n 
ome Ot ‘ 2 i. sessional 
‘1 = 2tz — 2: cos = 


with ¢ = 0, 


delhi, e - — —4,(0, te) > — _f 


k=1 1 — 2t, 


(0, te) > co oak 


 j—- RE 


(4.30) dt 


Further 


; et + = ty) a G . = hs | 
4.31) : = gah + SD ee 


and the mth partial derivative will contain the sum of the mth powers of the 
cosines. These are the sums of the powers of the real parts of the roots of unity 
te , motlk n 200k . ; 
and it is easily seen that = cos”: = 2 cos” only for m < ak where a is the 
n n 
largest common factor of n and 1. 


To change the moment generating function of bo to that ot b we must drop the 
n—l1l 

last term of the product. In the above expressions we then have >, cos 

k=l n 


m 2tlk 


and the same conclusion will hold. 





SERIAL CORRELATION 133 


5. Application to successive differences. If we change slightly the function 
n = 5n-1/V, investigated by von Neumann and Williams we find the moments 
and distribution greatly simplified. Let us define 


n 


(5.1) bn = Dy (ai — tin)” 


where 2n41 = 2, and consider the ratio om of 5;, to Da; . 
(5.2) 5, = 22° — 22x iXi41 
therefore 
5, ‘ 

(5.3) om = —*, = 21 — bo) 
and we may find the moments and distribution of om directly from those of 
bo. We find the moments to be: 
si 2?(n + 3) 

n+2 


m = 2 Me 


° * “(n+ 2)(n + 4) 
2*(n + 2k — 1)! 





mi; 


(k < n) 


and the ratio om is distributed according to the law 


(5.5) Cron” (4 aia om) 


up to nm moments. 
If we replace x; in the above ratios by 2; — & we find the moments of the ratio 
m = 6,/=(«; — 3) to be: 


_2n _ _Fn(n + 3) 

n—1 me fa = es 

(5.6) mp = 2 rt sin +5) Ann + 5)(n + 6)(n + 7) 
mS On = Din + 1)(n + 8) ‘(n= Din + 1)(n + 3)(n + 5) 
ee st et) 

(n + k)!(n — 1)(n + 1)(n + 3) +--+ (n + 2k — 3) 

and (m, — 2)” = z has the distribution 


(5.7) Ca (4 — 2) 


m = 





mn = 


up to 32 moments. 
The ratio 5,,/2a; compares the variation of the first differences to that of the 
original variates. We might wish to compare the variation of the second 





134 WILFRID J. DIXON 
differences to that of the first differences. For this purpose let us form the ratio 


n 
. 
2d, (t; — Wisa + Vise) Engi = 2h 
= > 


2 ao = X 
» (2; at Tix)” n+2 2 


t=1 


(5.8) 72 


to test the hypothesis H/,, that the variation of the second differences compared 
to the variation of the first differences is such as would occur by chance. Let 
v1, %,+*+:,2a, be normally distributed with mean a and variance o. The 
ratio m: is independent of the mean value of the variates, therefore we may 
consider a distribution with mean equal zero. We shall develop the mean and 
variance of m2 when the hypothesis to be tested is true. The moment generating 
function for the joint distribution of D. = S(x; — 2xis. + 2i42)°/20° and D, = 
Y(xi — 2i41)"/2o’ is 


elt; " to) a Elexp (De ti + D, t2)| 


“ sf Ls) q] 
= (Faz .) [- | exp (Ds0 + Dit - 0° 22) 11 dx; . 


We may find the moments then by a process similar to that outlined in (3.2) 
and (3.3). The next few steps are identical with (3.4) and (3.5). For the pres- 
ent problem, however, aq = 1 — 6t; — 2t., a2 = 44, + f, a; = —t so that 


o(4,b) = Tf — 6h — Qt + (4 + t)(oe + we!) — toe + wx”), 


k=1 


k=1 


" 2rk Ark 
II ' — 64, — 24+ (84; + 2t2) cos = — 2t, cos = 
n r 
n 9 b E 
I] E + 6 cos 2nk + ¢ cos =. 
k=1 n n 
If we follow the same procedure indicated in (3.8) to (3.13) we obtain succes- 


sively 


(5.11) y(t;, &) = II (a + b cos a, + € COs Qa) 


k=l 


. : ‘ 
log (a+b cos aj.+e cos 2a;) 


(5.12) 


(5.13) exp ey = = log (a + b cos a, + € Cos 2ax)) 
+7 tb k=1 


and replace the summation by the integral which is the limit of the summation 
asn— x, 


(5.14) g(t, 2) ~ exp (> | log (a + b cosa + ¢€ cos 2a) da) 
47 Jo 


ain aw £2 as ae 
(5.15) ~ exp (3 toe «| | vi 5 —/- *)) 





YO ee DO ee 


SERIAL CORRELATION 


b+ Vb + 8c — 8ac 
, 7 C= _ > ’ 6 => - —- a i 
where kK=a-c 7 Aa — c) 


We then have approximately 
6.16) lh) ~ Re + VE HA + VE. 


(5.14) follows from (5.13) if we replace the summation by an integral, and (5.15) 
is obtained in the following manner: replace cos 2a by 2 cos’ a — 1 and factor 
the resulting quadratic and integrate the factors separately. 


[ log (a + b cos a + € cos 2a) da = | log (a —c + beosa + 2c cos” a) da 
Jo 0 


(5.17) [ log x da + [ log (1 + 6 cos a) da + [ log (1 + 7 cos a) da 
0 Jo 0 


Qn log x + 2m log (1 + VW/1 — &) + 2x log 3(1 + V1 — 7°). 


If we now expand (5.16) by multiplying the factors within the brackets and 
substitute for x, » and 6 we find 


(5.18) v(t, 2)~(A+B+C+4+D\" =P, 
where 
1 — 64, — 22, 4A = 1(1 — 4t, — 2h), 
St, + 2te, 111 — 124 — 4te + Stile + 26 
— 2(4t, + to) V t3 + 4}, 
—2t , Y= 1] — 12h — 4t + Stitt, + 2t5 
+ 2(4 + &)V 3 + 4h}, 
D = 4(1 — 164, — 4h)’. 
From (5.18) P= A+ B+C+ Dandath = 0 
P=}(1+ V1 — 4b)’, 
oP 
at, 


vP  —-32 — (L—WV1— 4b) 
ati (1 — Ate)? 205 


= —21 + 2(1 — 44)‘, 


—_— 
ot; 


= a = #P 
= —inl | $n v(¥ + P op . 


= —inP 





136 WILFRID J. DIXON 


If we substitute in this formula and integrate the first with respect to f2 we shall 
obtain the first moment of the ratio 7.. If we integrate the second twice with 
respect to fg , we shall obtain the second moment of the ratio m.. We find these 
moments to be 


Bn +2 4, _ On + 23n + 12 


- = n ~ "in + 1(n + 2) 
2 2n>+i7n+4 


~ (n+ 1)2(n + 2)" 


(5.22) 


6. Likelihood criteria for multiple serial correlation. Given a sample of 
n observations, 21, %2,°-**,%n, We shall assume that they are distributed 
according to the law 


1 " _— > (2+ > ber ty) 
(6.1) dP, = (Sass) ; ro i=1 dx, +++ dx, (xs = Xn44) 
that is, that each variate, say at the time #, has as its mean value a linear function 
of the variates at time t —1,,¢t —l,, etc. Let us investigate the likelihood criteria 
for testing the hypothesis, H,, that each variate is independent of the others; 
ie. that the b: = 0 (¢ = 1,---,7). For the hypothesis H, we define the space 
Q and the space w, as follows: 


(6.2) - ¢ >0 —0o <a,bi < ~ 


w? o >0 —-~x <a< ow, 


we find the likelihood ratio criterion 
(6.3) 
in which 


an = 
6.4) a 
aij; = 7 (%a+1; _— ©) (Xa41; — Zi) 


and it is noted that ai; = do and if the 1; are equispaced aj,;,. = d-. iS a 
statistic which measures how completely each variate at time ¢ can be expressed 
as a linear function of variates spaced at time ¢ — l,, t — l, ete. 

Next we shall develop a statistic for testing the hypothesis, H,,,, that of the 
set of the values b; (¢ = 1, --- , r) in (6.1) the subset Dmii, Dmis, ++: , 
Here we have the same likelihood function but for H,,,., we define the spaces 
Q and w as follows: 


Q: "Ss 0 —2x <a,bi < ~; 
(6.5) . en | — < @, ~~ < wy, Dw — 


u=(l,---,m), w 





SERIAL CORRELATION 


and obtain the criterion 


(6.6) N2/n = | as; | | aus | 


‘| pq | Ase | 


(m <r). 


The form and the derivation of these \ criteria parallels very closely that of the 
likelihood ratio criteria obtained in multivariate analysis for testing significance 
of regression coefficients. 

Case I. If we set r = 1 in X, we obtain 


| Goo Qo | 

<i 2/n ao oo | 

(6.7) xr el 
Qo doo 


for which the distribution is given in (3.24). 
CaseIl. If wesetr = 2, we have 


Goo Qn oz | 

Qi Qn Ape 

; 2 Qo GAi2 22} 

(6.8) = Res an Taare l ’ 
lan Qy2 | 

|di2 ee | 


for which if we take |; = 1, lL = 2 we get 


1/Qo Aon Ao)! 


| dor Qo Ao 


| a a a 
(6.9) saieaeasaaeee 
;Qo0 01 | 


a 
” dor Qo | 


7" . . ° . 3 2 2 2 
The expanded form of this numerator is ao + 2a91d9 — do0@o2 — 29100 . 
Let us consider 


n C) 1 2 
f —— { Erg(1—09) —0) Srara+1—922 rar%a+2} 
(6.10) ¢(Oo, 61, G3) = (FF ) [. ° | e 202 o—"1 1 r II dite. 
TO — or 


We shall find the mean and variance of 3/" (mean = 0) when the hypothesis 

. y . . 2/ . 
of, (r = 2) is true. We can find the first moment of ods" then by performing 
the following operations: (a) compute 


Py vie Ide a 9A 
(6.11) nl Sale aie ee ee ot OE 
a0} 00; 30. 88 005 00; 30 


(this will give the first moment of the numerator) and set 6. = 0, (b) integrate 
from — x to 0 + 6 with respect to % + 6: = ¢, from —< to 0 — &, with 
respect to 6 — 6; = &, and set 6, = 0 (at this point we will have the first moment 
of the third order determinant divided by the second order determinant), (c) 








138 WILFRID J. DIXON 

integrate with respect to 4% from —~x to 0. The reason for step (6) is easily 
seen since the second order determinant aio — aj; may be written (ao — (x) 
(doo + Ga). 

Further moments may be computed in a similar manner. ¢7~2(60, 1, 62) 
may be written as a determinant in the manner indicated in (3.4) and (3.5), 
Here, a; = 1 — 6, @2 = a, = —36, and a3 = Gri = — 362 and ay to ay_2 = 0, 
then 


(6.12) 2°(6, 1, &) = [I > awk 


k=1 i=1 


1_T] Qak Aak 
= I] a, + 2a2 cos . + 2a; cos =% 


We shall approximate ovo(@., 6:, 62) by the method contained between (5.11) 


and (5.18). We set 
o¢2(90, 91, A) = II (« + bcos ane + ¢ cos | 
n n 


k=1 


(6.13) 


and obtained 


(6.14) wilh.) ~(A+B4+C + DI" = P™* 
where 

A=i(a —c) 

B = 3(4a° — 4c — 26° — 2bE’)' = 3! 
(6.15) C = 1(40 — 4c — 20° + 2bE")) = 17° 

D = }((a+ ec)” — BD) a if 

E = 0 + 8c(e — a). 


It is easily seen that we may operate (differentiate and integrate) with a, b, 
c, in place of 0, 0, 4 respectively. Therefore, we compute 








v = —inp oP 
0c 0c 
(6.16) ro P\? 2 
0 2 So - oP\ 0 P 
7 = —InPp? —in — a P 
0c ” | ” ) dc sd dc? 
and since P = A + B+ C + D we compute 
OA = 2. aA - 
dc 4’ ace 
0B , 108 eB — = _, 0°, 
— = -~Z*— >: = —| —4 2 ti B° 
dc 6B dc dc al 28 0c _ 0c 
” aC _1 OY rc. 4 [avy 40 y 
ee dc 187 a ac al ” (2 TY 3 
aD 190 a” D i F 40° 
<— = fe =; seas (Sf) +R 
0c 0c 0c? 8 0c 0c? 
dE vE 
= l6c — 8a; = 16. 
Bc IC a; ae ) 








SERIAL CORRELATION 139 


In order to evaluate the expressions in (6.17) we must find 


Care a ne "La +E =I 





dc ac’ 0c? 2\ dc 0c? ; 
‘ 1 OE 9” _ IE 9° E 
e188) = -e+ic'—; —J= -84+ 4 -- ae +a : 
0c oc 0c’ 5 0c’ 
) y° 
<< = 2(a'+ c); oF ae 
Oc 0c? 
If we now set c = 0, we obtain 


P = 3(a+ (a — 0’)') 
dP a-(a@— b”)} 


a P _ 2a‘ — 4a°b’ + bt + (—2a° + 2ab*)(a’ — b*)? 
dct 2b*(a? — b*)! 


We may now substitute these values in (6.16) and then substitute the resulting 
expressions in (6.11). The remaining values that are required for (6.11) are 
easily computed since they may be obtained from ¢ with c = 0, i.e. 
(6.20) o¢2(8 , 1,0) = (a+ (@ — yy. 
The result of these substitutions gives 
—e *(n + — 

8(a? — b?)! . 
in which we set d = 3(a — b) ande = 3(a + b) and integrate with respect to 
dande. We obtain 


(6.21) 


(6.22) a+ @— ey, 


an + “5h 
and if we set b = O and integrate with respect to a, setting a = 1, (0 = 0), we 
finally have 

(6.23) E(@z") = — 


n+2° 


We shall now obtain the first moment of A» without the restriction that the 
mean equal zero. For this purpose let us consider 


(6.24) o2(6o, 6; 02) _ @= 3) [- [oe —(1 1202) [agg(1— 69)— 29191—a9282) Il dX. 


Here a, = 1 — 6 + m, a2 = Gn = —30; + mM, G3 = Any = —362 + mM, a to 
GQn2 = m where m = (6) + 6 + 62)/n. Expanding the determinant as in 
(6.12) we find 


1 (60, 01, 2) = Il : swe 


k=1 i=1 


(6.25) 


n 


I (a + 2a, on = es 2a3 on ® — ey Ss /aeui). 


k=1 


i=4 








140 WILFRID J. DIXON 


Now 
=m > wi 
i=4 
ee ee 


m(n — 5), 


(6.26) 


so that 


n—l 


9 * ie 
(6.27) go (00, 0:1, %) = I] (« + 2a». cos ata + 2a; cos *) ; 
k=1 n n 
We have obtained here a product which is the same as that in (6.12) except that 
the last factor is missing. The approximation corresponding to (6.14) will 
then be 


[A+B+C+D\™" 

(a+b+c)7 . 
since we may take the approximation for the product from 1 to n and divide 
by the last factor, (a + b+ c). The procedure for finding the first moment for 


Ae (mean = a) is exactly the same as that outlined for finding the first moment 
of poke (mean = QO). ‘We obtain 


(6.29) E(\3'") = 


(6.28) ¢2(90, 01, 0) = 


— |] 
n+1° 
Case IIT. If we set r = 2, m = 1 in d,,m we have, if we take , = 1, = 
Qo Ani Me} 
Qo Ao Aoi} Ao 
(6 30) 2/n doz ao1 doo 
ot é , & | = ’ > 
ao ac 
| Qi do 


To find the moments of \2,; let us consider the following distribution, 


n 
1 y 


).31) IP, ={[- 1 a 262 ~ [rq—i—B (ta + ee i 
(6.31 dP, Va , dn. 
TO 


in which 6 represents the population value of the serial correlation coefficient. 
The moment generating function for the joint distribution of ao0/20°, ao/20° 
and de/2e" will be 


¢o,1(80, 01, 0) = is Fac) [- [exw (! o [Zl (te — &) —% B(ta+1 or z)F 


(6.32) —_ Ayo Oo —= ns A, — doe i II AX. 


ae [- [exp (3 503 - [aw(1 + B° — 4) 


+ ayi(—28 — 0) + tus(-8)1) II diva. 





SERIAL CORRELATION 141 


This function is very similar to (6.24). The approximation to ¢2,1(60, 61, 62) 
here will be exactly the same as that obtained in (6.28) for ¢2(@, 61, 62) except 
that herea = 1+ 8 — %,b = —28 — 6,,c = —6. For the case where the 
mean is zero, we find the approximation (6.14) in which a, b, and c have the 
above values. 

We may obtain the first moment of 2; by operating on the function 
2,1(80 , 01 , 8) proceeding as follows: (a) compute (6.11) as before and set 6. = 0, 
(b) integrate from — © to 6 + 6; with respect to 0% + 6: = ¢from — © to A — 4 
with respect to 0 — 0, = & (at this point we will have the first moment of the 
third order determinant divided by the second order determinant), (c) differen- 
tiate with respect to 4 , (d) repeat step (b), and set 6 and 6 = 0. 

The first two steps for obtaining the first moment of oz, (mean = 0) were 
performed for the first moment of oA: so that we may perform step (c) on (6.22). 
We obtain 
(6.33) niMa + (a — YP 


4(a?— Bi’ 
and finally by step (d) we have 


(6.34) BO) = 74 Bet @ - HHT, 


in which a = 1 + 6° and b = —2¢ since 6 and 6, have been set equal to zero. 
Substitution of these values in (6.34) shows that it is independent of 8, and we 
find 
2/ n 

35 E(odo'1 ) = ——.. 
(6.35) (aif) = 7 
Using ¢2,1(0 , 0:1, 9), the generating function for \2,1 (mean = a), we find 
(6.36) EQ?) =” —. 

The procedure for obtaining the second moments of the above criteria consists 
essentially of performing twice the operations prescribed for obtaining the first 
moments. The details given in connection with the first moment are sufficient 
to indicate the procedure. The details for the second moments are too com- 
plicated algebraically to list here. Table I indicates the second moments ob- 
tained as well as other moments obtained in the earlier parts of the paper. 


7. Serial correlation in several variables. Given a sample of n observations 
on each of k variables xi, 7 = 1, --- , k, we shall assume they are distributed 
as follows: 

A* 
~ (Qn) int 
We wish to test the hypothesis H;, that there is no serial correlation, i.e., that 
b; = 0,7 = 1,---,k. For this purpose let us define the space 2 and w as follows: 


(7.1) d?’.. 


4S Ag j(2iq—a5—b§2i,a+k) (2jaq—2j—bj2j,a+k) Il dz: 
tae 


(7.2) fa: \| Ai; || pos. def. —-~7 < a,b < @ 
, \o: || As; || pos. def. —2o <a; < ~; Bb = 0. 





(2.10)ff. 


(2.10) 


(6.9)ff. 


(6.9) 


(6.30)ff. 


(6.30) 


The mean of bo and b were 


WILFRID J. DIXON 


TABLE I 


E(x?) 
1 
n+2 
d 
n+1 
4(n + 3) 
n+2 
4n(n + 3) 
(n — 1)(n + 1) 
On> + 23n 
(n + 1)(m 
(n + 1)(n 
(n + 2)(n 
n(n + 2) 
(n + 1)(nm + 3) 
n 
n+ 4 
n— |] 
n+3 
n(n + 2) 
(n + 1)(n + 8) 


(n — 1)(n + 1) 
n(n + 2) 


o2 
1 
n-+2 
n(n — 3) 
(n — 1)?(n + 1) 
4 
n+2 
4n(n — 3) 
(n — 1)?(n + 1) 
2n° + 7n + 4 
(n + 1)2(m + 2) 
2(n + i) 
+ 2)?(n + 4) 
2n 
+ 1)*(n + 3) 
4n 
+ 2)*(n + 4) 
4(n — 1) 
+ 1)?(n + 3) 
2n 
+ iFin + 3) 
2(n — 1) 


n(n +- 2) 


also obtained by Anderson {8}. 


The development of the appropriate d criterion for this case parallels very closely 
the development of the \ criteria in multiple regression analysis. The criterion 
obtained for testing the hypothesis H;, is 


(7.3) 


where 


Z/n _ 
Nein ii 


| is bis | 
| bi; Qi; 
’ 


| as |? 


ai; = >> (tia — &4) (Xia — 4), 
bj = 3[DE ia 


The probability theory for the \ criteria in (7.3) remains to be developed. 


— Ei) (Tj,a42 = E;) + > te oat _ Ei) (Wa — £})]. 





SERIAL CORRELATION 143 


8. Summary. \ problem in serial correlation which has received considerable 
attention is that of devising a statistic for indicating the presence of a relation 
between successive observations, i.e. a lack of independence of the order in which 
the observations were drawn. Von Neumann developed the distribution and 
moments of the ratio of the mean square successive difference to the variance. 
R. L. Anderson presented the distribution of a serial correlation coefficient which 
is the ratio R = DaeXa+1/Dx, (lL > 1, subscripts reduced mod n). 

The present investigation was undertaken with the object of developing the 
likelihood ratio functions for testing various hypotheses connected with serial 
correlation in one or more variables and determining the moments and in some 
eases the distributions of these likelihood ratios. 

The variates are considered to be ordered by their subscripts a = 1, --- , n. 
The introduction of 241 = 21, Xn42 = 2X2 etc. is made to obtain a symmetry 
which greatly simplifies the problem. 

The likelihood ratio criteria were developed for testing the hypotheses 

a) that x, is independent of 244) 

b) that 2. is independent of wo4:;, 7 = 1,--- 


’ , 


c) that x. is independent of some subset of the 2441, 

d) that in the case of several variables vig, i = 1,--:,k, a = l,--- yn 
the %i2, 7 = 1, --- , / are independent of the x;,.;;,. These criteria are similar 
in form to those obtained in regression analysis. 

The likelihood ratio criterion for testing the hypothesis a) turns out to be 
\ = (1 — R’)"” where R is the function given above. The moments of R are 


obtained and from these the moments of \”’". These moments are found to 
agree with those of a Pearson Type I curve to n/2 moments. A simple trans- 
formation gives us the moments of a ratio differing from that used by von 
Neumann by the addition of the term (x, — 21)” to the numerator. A simplifica- 
tion of the moments is attained by this change. In fact, if we denote this 
altered statistic by » we find that (7 — 2)° is distributed according to a Pearson 
Type I curve to n/2 moments. 

The mean and variance were determined for the ratio of the sum of squares 
of the second successive differences to the first successive differences. 

The mean and variance are obtained for the likelihood criteria for testing 
the hypothesis b) for » = 2, and for testing the hypothesis c) for r = 2 where 
Za+t, is the subset of 241, ; (@ = 1, 2). 

All the above moments were obtained under the assumption that the hypothe- 
sis to be tested was true. No results have been obtained thus far in cases b) 
and ¢) for a general r nor for hypothesis d). 

The moments for the several cases above were obtained by the use of moment 
generating functions which, for the criteria used, took the form of the product of 
nterms. In the case a) it was shown that the product could be approximately 
represented by the nth power of a single expression which was equivalent for the 
purpose of obtaining the first n moments. A method was developed for making 
analogous approximations to the generating functions for cases b) and c) since 





144 WILFRID J. DIYON 


it was not found possible to obtain the moments from the products in their 
original form. 


The author wishes to express his gratitude to Professor 8. 8. Wilks under 
whose helpful direction this paper was written. 


REFERENCES 


[1] J. von Neumann, R. H. Kent, H.R. Beviinson, and B. I. Hart, ‘“The mean square 
successive difference,’?’ Annals of Math. Stat., Vol. 12(1941) pp. 153-162. 

[2] J. von NEuMANN, “Distribution of the ratio of the mean square difference to the 
variance,’ innals of Math. Stat., Vol. 12(1941) pp. 367-395. 

[3] B. I. Harr and J. von Neumann, ‘Tabulation of the probabilities of the ratio of the 
mean square successive difference to the variance,’’ Annals of Math. Stat., Vol. 
13(1942) pp. 207-214. 

J. von NEuMANN, ‘“‘A further remark concerning the distribution of the ratio of the 

mean square successive difference to the variance,’’ Annals of Math. Stat., Vol. 
13(1942) pp. 86-88. 

[5] J. D. WiniiaMs, ‘‘Moments of the ratio of the mean square successive difference to the 
mean square difference in samples from a normal universe,’ Annals of Math. 
Stat., Vol. 12(1941) pp. 239-241. 

[6] L. C. Youna, ‘‘On randomness in ordered sequences,” Annals of Math. Stat., Vol. 
12(1941) pp. 293-300. 

[7] R. L. ANDERSON, ‘‘Serial correlation in the analysis of time series,’’ unpublished thesis, 
Iowa State College, 1941. 

[8] R. L. ANDERsoN, ‘‘Distribution of the serial correlation coeflicient,’’ Annals of Math. 
Stat., Vol. 13(1942) pp. 1-138. 

[9] T. Koopmans, ‘‘Serial correlation and quadratic forms in normal variables,’’ Annals 
of Math. Stat., Vol. 13(1942) pp. 14-33. 

[10] P. S. Lapnace, Traite de Mecanique Celeste, Vol. 2, paragraph 21. 





ON A STATISTICAL PROBLEM ARISING IN THE CLASSIFICATION 
OF AN INDIVIDUAL INTO ONE OF TWO GROUPS' ; 


By ABRAHAM WALD 


Columbia University 


1. Introduction. In social, economic and industrial prob!ems we are often 
confronted with the task of classifying an individual into one of two groups on the 
basis of a number of test scores. For example, in the case of personnel selection 
the acceptance or rejection of an applicant is frequently based on a number of 
test scores obtained by the applicant. A similar situation arises in connection 
with college entrance examinations. Again, on the basis of a number of test 
scores, the admission or rejection of a student has to be decided. In all such 
problems it is assumed that there are two populations, say 7 and 7 , one repre- 
senting the population of individuals fit, and the other the population of individ- 
uals unfit for the purpose under consideration. The problem is that of classifying 
an individual into one of the populations 7, and z2 on the basis of his test scores. 
Often, some statistical data from past experience are available which can be 
utilized in making the classification. Suppose that from past experience we 
have the test scores of N; individuals who are known to belong to population 7 , 
and also the test scores of N2 individuals who are known to belong to population 
mm. These data will be utilized in classifying a new individual on the basis of 
his test scores. 

In this paper we shall deal with the statistical problem of classifying an in- 
dividual into one of the populations 7; and 7:2 on the basis of his test scores and 
on the basis of past experience, given in the form of two samples, one drawn 
from 7, and the other from 72. In the next section we give a precise formulation 
of the statistical problem and state the assumptions we make about the popula- 
tions 7 and 7. 


2. Statement of the problem. We consider two sets of p variates (a1, --- , Xp) 
and (y1,°-:,Yp). It is assumed that each of the sets (a1,---,2»,) and 
(y1,°**,Yp) has a p-variate normal distribution and the two sets are inde- 
pendent of each other. It is furthermore assumed that the covariance matrix 
of the variates x21, --- , 2p» is equal to the covariance matrix of the variates 
Yrs tty Ypy le. Czz2; = Fyy; (t,jJ = 1,°--, p). We will denote this common 
covariance by o;;. Let us denote the mean value of x; by u; and the mean value 
of y; by v;. Furthermore we will denote the normal population with mean 
values 441, -** , Hp and covariance matrix || ¢;; || by :, and the normal popula- 
tion with mean values 1, ---+ , vp and covariance matrix || ¢;,; || by a. 

A sample of size N; is drawn from the population 7; and a sample of size N2 is 


‘The author wishes to thank Dr. Irving Lorge, Columbia University, for calling his 
attention to this problem. 


145 





146 ABRAHAM WALD 


drawn from the population a. Denote by 2,2 the a-th observation on z; 
(@@=1,---,p;a=1,---, Ni) and yg the B-th observation on y; (¢ = 1, --- , p; 
8 = 1,---, Ne). Let 2; @ = 1,---, p) be a single observation on the 7-th 
variate drawn from a p-variate population 7, where it is known a priori that = is 
either identical with m or with m.. ‘The set (21, +--+ , Zp) is assumed to be dis- 
tributed independently of (a4, ---, 2p) and (yi, +--+ , Yp). 

We will deal here with the following statistical problem: On the basis of the 
observations Xie, Yis, Zi (2 = 1,°---, pjpa=1,---,Mi;8 = 1,---, Ne) we 
test the hypothesis H; that the population z, from which the set (21, --- , Zp) has 
been drawn, is equal to m. The parameters w,---, up, %1,°°*, Yp and 


\| oi; || are assumed to be unknown. 


3. The statistic to be used for testing the hypothesis 17,. In this problem 
there exists only a single alternative hypothesis to the O-hypothesis H, to be 
tested, i.e. the hypothesis Hz that 7 is equal to m.. If the parameters 1, --- , 
Mp, 1, °** y ¥p and || o;;|| were known we could easily find (on the basis of a 
lemma by Neyman and Pearson) the critical region which is most powerful with 
respect to the alternative H.. Let us assume for the moment that the para- 
meters wi, °°", Mp, %1,°°*, Yp and || o;;|| are known and let us compute the 
critical region for testing H, which is most powerful with respect to the alterna- 
tive H.. According to a lemma by Neyman and Pearson’ this critical region is 
given by the inequality 


(1) poz ie. Zp) _ k, 


Pi(Zi, +++) 2p) — 


where p; (21, --- , Zp) denotes the joint probability density function of z: , +--+ , Z, 
under the hypothesis Hi, po(zi, --- , Zp) denotes the joint probability density 
function of (2; , --- , 2») under the hypothesis He , and k is a constant determined 
so that the critical region should have the required size. 

Denote the determinant value | o;; | of the matrix || o;;|| by o. Then 


ru 
D> oti (2g—ug) (27-2 j) 
j=1 i=1 


(2) Pilz, ***, 2) = (Qn)Pg e 


and 


a 7) (25-15) (zj—v;) 


(3) po(Zi,°**, 2p) 


where the matrix || o*’ || denotes the inverse matrix of the matrix || oi; ||. Tak- 
ing logarithms of both sizes of the inequality (1), we obtain the inequality 


4) -—3{00 Dd ole: — ves — x) — & — wd; — wd} = log k. 


2 J. Neyman and E. S. Pearson, ‘Contributions to the theory of testing statistical 
hypotheses,’’ Stat. Res. Mem., Vol. 1, London, 1936. 





A SiLATISTICAL PROBLEM 


Multiplying both sides of (4) by 2, we have 
(5) De Leo (ei — midi — wi) — (& — vi(e; — vf] > 2 log k. 


The critical region (5) is most powerful with respect to the alternative H2 , but 
it cannot be used for our purposes since the parameters m1, --- , Mp, Y¥1,°** 5 Mp 
and || o:;|| are unknown. The optimum estimate of o;; on the basis of the ob- 
servations Xia and yj is given by the sample covariance 


Ni N2 
d (Lia — Fi) (ja paar t;) + a (Yis a ¥i) (yja — 9;) 
6) =" Ni+ M2 

Z Lia 7 Y ‘3 


where 7; = — V and 9; = . The optimum estimates of u; and vy; are 
‘V1 


No 


given by z; and 9; respectively (¢ = 1,---,p). Hence for testing H, it seems 
reasonable to use the statistic R which we obtain from the left hand side of (5) 
by substituting the optimum estimates for the unknown parameters. Thus R is 
given by 


(7) R= DU sl — We; -— 8) — & — WE; - Td) 


' 17 1 —l 
where |} s'” || = || si; || 


equality 


(8) R> Cc, 


The critical region for testing H, is given by the in- 


where C is a constant determined in such a way that the critical region should 
have the required size. It is interesting to notice that R is proportional to the 
difference T; — T> where T; (i = 1, 2) denotes the generalized Student’s ratio® 
for testing the hypothesis that the set (2: , --- , Zp) is drawn from the population 
mi. In our case the statistic 7; cannot be used for testing H, , since T; is appro- 
priate for this purpose if the class of alternative hypotheses contains all p-variate 
normal populations having the same covariance matrix as 7. In our case the 
class of alternatives consists merely of a single alternative, namely, the alterna- 
tive mm. 

For the sake of certain simplifications we shall propose the use of a statistic U 
which differs slightly from the statistic R. In order to obtain U, we consider the 
inequality (5). Since o’ = «” this inequality can be reduced to 


(9) De de o%e(v; — ws) > F, 


where k’ denotes a certain constant. The statistic U is obtained from the left 
hand side of (9) by substituting the optimum estimates for the unknown para- 


’ See. in this connection H. Hore.uine, ‘“‘The generalization of Student’s ratio,’’ Annals 
of Math. Stat., Vol. 2, and R. C. Bose and S. N. Roy, ‘‘The exact distribution of the Stu- 
dentized D? statistic,’ Sankhya, Vol. 3. 





148 ABRAHAM WALD 


meters. Thus 

(10) U = >2s"2,(g; — #)), 
and the critical region is given by the inequality 
(11) Uy > @, 


where the constant d is chosen so that the critical region should have the re- 
quired size. The statistic U differs from R merely by a term which does not 
depend on the quantities z:,---, 2). If Ni and Ne are large the difference 
U — Ris practically constant and therefore the critical regions (8) and (11) are 
identical. The use of U seems to be as justifiable as that of R and because of 
certain simplifications we propose the use of the critical region (11). 

The statistic U is closely connected with the so called discriminant function‘ 
introduced by R. A. Fisher for discriminating between the two populations 7 
and m2. The discriminant function D is given by 


(12) D = bydi + bode + +--+ + bpd, 


p = 

where d; = 9; — 7; and the coefficient b; is proportional to > s’d;. The co 
j=1 

efficients b:,--- , bp are called the coefficients of the discriminant function. 


Pp 
We see that U is proportional to the statistic >> biz; which is obtained from the 


i=] 


right hand side of (12) by substituting z; for d;. 


4. Solution of the problem when .\V, and N2 are large. Denote by F(U, 1, 
Nz | x:) the cumulative probability distribution of U under the hypothesis that 
the set (21, --- , 2p) has been drawn from the population 7; (¢ = 1, 2). If M 
and N2 approach infinity the distribution F(U, Ni , Ne | 7:) converges to a normal 
distribution, since the variates s;;, ; and 9; converge stochastically to the con- 
stants oi;, ui and v; respectively (7,7 = 1,---,p). Let us denote lim F(U, 


V ,=No=x 

Ni, N2| zi) by ®(U | 7;) (¢ = 1, 2). Furthermore denote by a; the mean value, 
and by o; the standard deviation of the distribution ®(U | 7;) (¢ = 1, 2). It is 
obvious that o1 = o2 = o (say). It is easy to verify that the variates 


(13) Zzs" (9; — Z,), 


(14) SEs" 7(G; — 45), 
? Pp Pp Pp 


a. s* 8 (ji — F)(Hi — 41)8i; 
= t=1 j=1 k=1 l=—1 

(15) : 

> sj — &)(G — %), 


c=1 l=1 


converge stochastically to the constants a: , a: and o respectively. 


4R. A. Fisuer, ‘‘The statistical utilization of multiple measurements,” Annals of Eugen- 
ics, 1938. 





A STATISTICAL PROBLEM 149 


Hence for large values of Ni and Ne we can assume that U is normally dis- 
tributed with mean value @; and standard deviation ¢ if the hypothesis H; (i = 
1,2) is true. Thus the critical region for testing H, is given by the inequality 


(16) U > mm + da, 


° e 1 - —12/9 ° 
where the constant A is chosen in such a way that on [ e  dtis equal to the 
V 2 Jy 


required size of the critical region. 

Finally, some remarks about the proper choice of the size of the critical region 
may be of interest. Two kinds of error may be committed. H, may be rejected 
when it is true, and H; may be accepted when H; is true. Suppose that W, and 
W. are two positive numbers expressing the importance of an error of the first 
kind and an error of the second kind respectively. If the purpose of the statisti- 
cal investigation is given it will usually be possible to determine the values of 
W,and W.. We shall deal here with the question of determining the size of the 
critical region as a function of the weights W, and W.. Denote by P; the prob- 
ability that (16) holds under the assumption that H; is true (¢ = 1, 2). Then 
P, is the size of the critical region (also the probability of an error of the first 
kind), and 1 — P» is the probability of an error of the second kind. Both prob- 
abilities P; and Ps: are functions of \ and are given by the following expressions: 


«© 


~ — 1 = —t2/2 
(17) P, = \/2n . é dt, 


and 


(18) ie [ oP at 
© A 2x J a4 99/8040 , 


From (13) and (14) we obtain 
(19) a — a = 2) 2) 8G — 4:)(G; — 43). 
i a 


Since the right hand side of (19) is positive definite, we have a > &. Hence 
because of (17) and (18) we also have P, > P,. By the risk of committing a 
certain error we understand the probability of that error multiplied by its 
weight. Hence the risk of committing an error of the first kind is given by WiP1, 
and the risk of committing an error of the second kind is given by W2(1 — P»). 
It seems reasonable to choose the value of \ so that the two risks become equal 
to each other, i.e. such that 


(20) WiP,; = W2(1 — P»). 


Hence using (17) and (18) we obtain the following equation in A 


1 ” ie 1 (G1-aa)/a+s 
(21 Ww =f wra- We [ FP dt = 0. 
) I 1 \/ 20 : é dt Vo \/2n | e di 0 





150 ABRAHAM WALD 


Using a table of the normal distribution, the value of \ which satisfies the equa- 
tion (21) can easily be found. For W, = W, the solution of (21) is given by 


and the critical region is given by the inequality 


> — ay 


U>m+re=at@ > 


5. Some results concerning the exact sampling distribution of the statistic U. 
If Ni and N-2 are not large the solution given in section 4+ cannot be used and it 
is necessary to derive the exact sampling distribution of U. Let 


NiNe _ 
(22) Gi: — ) 4/ yd, - 


Then 


(23) um fA ED oe 


/ . . . 
where the variates z; , --- , 2» are distributed independently of the set (z:, --- 
a N, No 

Zp), the mean value of z; is equal to (¥; — ui) . . 

Ni + Ne 

/ 4 . 

between z; and z; is equal to o;;. It is known the at the set of covariances s;; is 
distributed independently of the set (2:1, --- , Zp. 21, °** » 2») and therefore the 
distribution of U remains unchanged if instead of (6) we have 


Du tie 
(24) 8; = *= (n = Ni + Ne — 2), 


n 


; 


and the covariance 


where the variates ti. are distributed independently of the set (2, --- 
21, °**, 2p), havea joint normal distribution with mean values zero, o1;,t;, = i) 
and o:;,1;; = 0if a ~ B. It is necessary to derive the distribution of U under 
both hypotheses H; and //,. Jn both cases the mean values of 2, --- 


/ / y . . e e 
21, °°*, 2, are not zero. Instead of U we will consider the statistic 


p Pp 
. , 
ij ai 
= > > , 8° 252; 


i=1 j=1 


” 
9 “py? 


which differs from U only in the proportionality factor 4/ ™ te > The distri- 
4V¥14V2 


butions of U’ under the hypotheses H; and H:2 are contained as special cases in 
the distribution of the statistic 


(25) v= i Zo 87 binaitjng? 5 
j i 





A STATISTICAL PROBLEM 151 


where si; is given by (24) and the joint distribution of the variates fi 
-,n + 2) is given by 
ce oa 
1 “32 _ z Uae tit EO iar MED) Haya Cian) | 
(Qm)Pint2)/2 ont . 


j=l i a=l 


(26) = 
x Il >, dtes. 
=1l i=1 


The quantities &,---,&, m,-°-*:, mp are constants and o denotes the de- 
! 


terminant value of the matrix || ¢;; || . 

We will deal here with the distribution of the statistic V given in (25) under 
the assumption that the joint distribution of the variates tig (¢ = 1, --- , p; 
8 = 1,--:,” + 2) is given by (26). 

In order to derive the distribution of V we shall have to prove several lemmas. 


Lemma 1. Let |! A; || (4, 7 = 1,---+ , p) be an arbitrary non-singular matrix, 
and let 


Pp 
tie = Dy das tes “++, p38 = 1, +--+, + 2). 
j=1 
Let furthermore s;; be given by 
do tietia 
a=l ae 
n ° 


i5 i3,/ , e . ° y ° . 
Then >. 7 Sts naatjngs = Dd 8 ti nailinae, 1€. the statistic V is invariant 
) u j : 


j t 


, 
= 


under non-singular linear transformations. 
Proor. We obviously have 


p ?p 
~” / , 
(27) ti nsiljnee = Z Zz Nek Agr lings linse - 
bet fond 


Furthermore we have 


Pp Pp 
(28) j= z. bo Ns Azz Sx. - 
k=1 l=] 
Hence 
(29) | = |] Nes HT Uses |] |] Ass I 
where Ni; = Gi ‘ 
From (29) we obtain 
(30) Lo |] = XU Ms A’ dl, 
and therefore 


Pp 


. Ee - . 
(31) Mat ees. 
k=1 [=1 











152 ABRAHAM WALD 


Hence from (27) and (31) we obtain 

(32) a; Zz. a on = > } * 3 Z. 2 7 ~ ‘ed 3 Niu Ajo tuwn+1 ty.n+2 ‘ 
. s Tt... F 

The coefficient of ty jn4ityjn42 on the right hand side of (32) is given by 


@ BB Ze wal rads = 2 1 a)e} = &. 
; < = a k ol i i 


Lemma 1 follows from (32) and (33). 

Lemma 2. The distribution of V remains unchanged if we assume that the co- 
variance matrix || o:;|| is equal to the unit matrix, i.e. the joint distribution of the 
variates tig (0 = 1, +--+, p38 = 1,---,n + 2) is given by 


1 Pp n 2 is s 
1 —s} 2 ZF tiat J (ts,n+1—0s)?2+ ZF (tents? | 
7 ‘ a “Li=1 a=1 i i 
o4) (Qr)rintarie © ’ 
where the constants p; and ¢; are functions of the constants &, ++: ,&>,™m,°** 5 


and of the o;;. 

Lemma 2 is an immediate consequence of Lemma 1. Hence we have to derive 
the distribution of V under the assumption that the variates t;3 have the joint 
distribution given in (34). 

Let R; (@ = 1, --: , p) be the point of the n + 2 dimensional Cartesian space 


with the coordinates ti,---, ting. Let P = (m,--+, Unie) and Q = 
n+2 


(v1, +++ ,Un42) be two arbitrary points such that > >» ugv3 = Oand >, us = >, 03 = 1. 
B=1 


Denote by 0 the origin of the coordinate system and let f;,.4; be the projection 
of the vector OR; on the vector OP. We have 


n+2 
(35) tint = ym tig Us (2 m= i,---, p). 

B=1 
Similarly, the projection f;,.42 of the vector OR; on 0Q is given by 

n+2 
(36) linge = D, tise. 

B=1 
Let R; (¢ = 1, --- , p) be the projection of the point R; on the n-dimensional 
hyperplane through 0 and perpendicular to the vectors OP and 0Q. Denote 
the coordinates of R; by ria, --+ , ring2 respectively and let 3;; be defined by 

n+2 
2 Tip 758 
(37) 83; = = ° 
If we rotate the coordinate system so that the (n + 1)-axis coincides with OP 
and the (n + 2)-axis coincides with 0Q, and if fa, --+ , fins: denote the co- 
ordinates of R; (¢ = 1, --- , p) referred to the new system, then we have 
. 1 n+2 1 ae 
(38) 3; = - > rig'is = — tiatja, and 
nN p=1 L awl 
; n+2 n+2 
(39) ligt = lis Lia : 
s=1 B=1 


A STATISTICAL PROBLEM 


From (38) and (39) we obtain 
nt+2 7 7 _ 
a liptss — Uinarljnsi — binseljng 
3, = b=! see 
(40) Sij n 
We will now prove 
Lemma 3. Let V be defined by 


(41) V = 2 Zz s? ues bjna2 ’ 
7 a 


where fin41, lins: and 3;; are given by the formulas (35), (36) and (40) respec- 
tively. Let furthermore the joint probability distribution of the variates tig (i = 
1,--:,p;8 = 1,:+:,m + 2) be given by 


1 Pp n+2 
1 —s] 2 DJ (tip—psug—fiva) 


(42) (Qn)pormRe te I] I] dtis « 


Then the distribution of V calculated under the assumption that the quantities 
Ur, °° » Ung2, Vly *** 5 Unye are constants and the joint probability distribution of 
the variates tig 1s given by (42), is the same as the distribution of V calculated under 
the assumption that the joint probability distribution of the variates tig is given by (34). 
Proor. If we rotate the coordinate system so that the (n + 1)-axis coincides 
with OP and the (mn + 2)-axis coincides with 0Q, and if fa , --- , t..42 denote the 
coordinates of R; (¢ = 1,---, p) in the new system, then fj,n4; and fj,n42 are 
given by the right hand sides of (35) and (36) respectively. Furthermore 


Hence the distribution of V is certainly the same as that. of V if the 
joint probability distribution of the variates fig (¢ = 1,---,p; B = 
1,---,n + 2) is given by the expression which we obtain from (34) by sub- 
stituting t:3 for tig. Thus, in order to prove Lemma 3 we have merely to show 
that if the variates ij; have the joint probability distribution (34), the variates 
tig have the joint probability distribution (42). Since the variates fia , +--+ , fi.n4s 
are obtained by an orthogonal transformation of the variates ti, --- , tins, 
it follows that the variates tig (¢ = 1,---,p; 8 = 1,---,n + 2) are inde- 
pendently and normally distributed with unit variances. We have 


n+2 


(43) tie = 2 Nev lin 
Y= 


where \g3, is equal to the cosine of the angle between the 6-th axis of the original 
system and y-th axis of the new system. Since 


Asngt = Us and Aganso = Us, 



















154 ABRAHAM WALD 





and since E(t;,) = Ofory = 1, --- 
from (43) that 
(44) EK(tis) = pills + Cvs ° 


Hence Lemma 3 is proved. 
We will now prove 


,n, E(tnai) = piand E(tin+2) = ¢;, it follows 


Lemma 4. Let P be a point with the coordinates uw, +--+ , Unig and Q a point 
with the coordinates v; , +++ , Un42 Such that Lugvg = 0 and Tug = Lvs = 1. Denote 


by L, the flat space determined by the vectors OR, , --- , OR, (Ri = (ta, ++ , tinge) 
and let P be the projection of P on Ly and Q the projection of Q on Lp. Denote 
furthermore by 6; the angle between the vectors OP and OP, by 6; the angle between 
OP and 0Q, by 6 the angle between 0Q and 0Q, by 62 the angle between OQ and OP, 
and finally by 63 the angle between OP and 0Q. Then the statistic V defined in (41) 
is equal to 


10 a a2 
| by au aA 
- > |\02 Qyo dhe 
(45) f.--2A® , 
;Q11  Ai2| 


| 


/Ay2 Ane | 





where 
. 3 , , 2 
(46) a = cos 4; ad = cos 6, cos 0.; bi = cos 0, Cos; bo = COS be; 
2 2 2 2 2 2 
cos 6; — a; — bj cos 02 — dz — bs 
au = - , se ; 
n n 
(47) 





COS 6; COS 62 COS 63 — a,d2 — bi be 
n ; 


and ap = 


Proor. If we rotate the coordinate system in such a way that the (m + 1)- 
axis coincides with OP and the (n + 2)-axis coincides with 0Q, and if 
la, -*: , tinge ave the coordinates of R; in the new system, then 

n 
Zz bia t ja 
a=1 a 


n 


i 





According to Lemma | the statistic V is invariant under linear transformations 
of the variables tjg. Hence V is also invariant under linear transformations of 
the variables fig. Thus the value of V remains unchanged if the points 
R,,--:, Rp are replaced by arbitrary points R, --- , R’, of L, subject to the 
condition that the vectors OR}, ---,0R', be linearly independent. Hence 
we may assume that the vectors OR; , --- , OR, are perpendicular to each other 
and lie in the intersection of L, with the n-dimensional flat space which goes 
through 0 and is perpendicular to OP and 0Q. Furthermore we may assume 
that Ri = P and R, = Q. Then OR; is perpendicular to OP, 0Q, OR, and OR: 
(i = 3,---,p). 














A STATISTICAL PROBLEM 


The statistic V can obviously be written in the form: 


0 fina noe bp.n+t | 


linge Su cc Sry 


Because of our choice of the points R,,---,R,, we have 


(49) bs n4i — ti n+e = 0 


and 


(50) iat; if @#Q 

B=1 
From (49) and (50) it follows that §;; = 0 for i + j except 2 which is not neces- 
sarily zero. Hence V reduces to the expression 


tint len4i| 
Su S12 | 


(51) S12 Soe 


n 8 


| S12 S22 | 


We obviously have fin41 = di, feng = Q2, Ange = br and banys = be. 

For any two points A and B denote the length of the vector AB by AB. Since 
niu + (finds) + (nae) = OP’, nde + (fens) + (bna2)” = 0g and nS + 
hinzitenga + fin4olense = OP-0Q-cos 03, we can easily verify that 31 = aun, 
$2 = dy and S82. = dx». Hence Lemma + is proved. 

The angles 6; and 6 can be expressed in terms of the angles @,, 02 and @3. 
In order to show this, let us rotate the coordinate system so that the first p 
coordinates lie in the flat space L, defined in Lemma 4. Let ui, +--+, Wa42 be 
the coordinates of P and v; Wee ar the coordinates of Q referred to the new 
axes. Then, since OP = 0Q = 1, we have 
UL; + +++ + Up Uy | 

49 


Vo? + eee +, 


? 


9 49 , 
cos 6; = Vu? + °° +4,; cos 9; = 


ae / / 
ry re ’ U,V, + +s + UyQDY 
cos = V yi + ---+40,; cos 0. = : weenie * 


49 


12 2 
U1 + Po + Up 
, £ / , 
: U1; + oes + Up» 
and cos 63 = ; 7 y a 
9 9 9 S 
Vultee bupVMoe + oe +05 
Hence 


, 
2 


, 
cos 6; = cos 4 cos 6; and cos 0. = cos 6 cos 63. 











156 ABRAHAM WALD 


Introducing the notations 
9 » 
mM = cos A, me = cos 6 and m3 = cos 4; COs 6 Cos 43, 


we have 


a; = m, a2 = M3, b; = ms, bo = me; 
m, — mi — m3 m3(1 — m: — me) 
| - - y ae = on na 
n n 


M2. —- Me — M3 


and ia = — 


n 


Substituting the above values in (45) we obtain 


Il 
| 
| 


—-n = pei 
mz — 1 +m + m2 — mM 


cos 6; cos 62 cos 6s 





~ —~" cos? 6; cos? 62 cos? 6; — sin? 6; sin? 62° 
Hence, Lemma + can be written as 

Lemma 4’. Let P be a point with the coordinates u;, +++ , Un42 and Q a point 
with the coordinates v1, +--+ ,Una2. Denote by L, the flat space determined by the 
vectors OR; , --- , OR, and let P be the projection of P on Ly and Q the projection of 
Q on L,. Denote furthermore by 6; the angle between OP and OP, by 6» the angle 
between OQ and 0Q and by 63 the angle between OP and 0Q. Then the statistic V 
defined in (41) zs equal to 

. = COS 6; COS 42 COS 43 
(45') [6S ee 

cos 6, cos 6 cos 63; — sin” 4; sin” 45 

If P is a point of the (n + 1)-axis and Q a point of the (n + 2)-axis, then V 
is identical with the statistic V given in (25). Hence we obtain the following 

Geometric interpretation of the statistic V defined in (25). If 6: denotes the 
angle between the (n + 1)-axis and the flat space L, determined by the vectors 
OR,,---,OR,, 6 the angle between the (n + 2)-axis and the flat space L,, 
and if 6; denotes the angle between the projections of the last two coordinate 
axes on L,, then the statistic V is equal to the right hand side of (45’). 

Denote by S the 2n + 1-dimensional surface in the 2n + 4-dimensional 


space of the variables wi, +--+ , Unie, 01, °°* , Unge defined by the following 
equations 

n+2 n+2 n+2 
(52) > us = > vu,=1; Ugvg = 0. 

s=1 p=1 B=1 


denote by C the 2n + 1-dimensional volume of the surface S, i.c¢. 


Ss 













































ant 
the 
L of 
gle 


Vv 
ing 
the 
ors 
4p) 


ate 


nal 
ing 


A STATISTICAL PROBLEM 154 


Now we will assume that wi, --- , Un42, U1, °°* , Ung2 are random variables 
and the joint probability distribution function is defined as follows: the point 
(ur, °*' » Unga, M1, °°°, Un+2) is restricted to points of S and the probability 
density function of S is defined by 


dS 

Cc: 

Hence for any subset A of S the probability of A is equal to the 2n + 1-dimen- 
sional volume of A divided by the 2n + 1-dimensional volume of S. It should 
be remarked that the probability density function (54) is identical with the 
probability density function we would obtain if we were to assume that 
ln, *** » Ung2y U1, °** 4 Ung ave independently, normally distributed with zero 
means and unit variances and calculate the conditional density function under 
the restriction that (wi, +--+ , Ung2, U1, °** , Ung2) is a point of S. 

Lemma 5. The probability distribution of V defined in (41), calculated under 
the assumption that the joint probability density of the variables u,, +--+ , Uns2, 
i, *** y Unde tig (@ = 1,---, p; 8B = 1,°--,n + 2) ts given by the product of 
(54) and (42), ts the same as the distribution of the statistic V calculated under 
the assumption that the variables tis have the joint probability density function given 
in (34). 

Lemma 5 is an immediate consequence of lemma 3. 

Lemma 6. Let L, be an arbitrary p-dimensional flat space in the n + 2 dimen- 
sional Cartesian space, and let M, be the flat space determined by the first p co- 
ordinate axes. Assuming that the joint probability density function of ug, v3, 
tig (¢ = 1,---,p;8 = 1,--:,n + 2) ts given by the product of (54) and (42), 
the conditional distribution of V calculated under the restriction that the points 
Ri, --:, Rp lie in L,, is the same as the conditional distribution of V calculated 
under the restriction that the points Ri, ---,R,liein M,. The point R; denotes 
the point with the coordinates tix, --+ , tinze- 

Proor. Let P be the point with the coordinates wu, --- , Unze and let Q be 
the point with the coordinates 1, ---,Un4g. Let us rotate the coordinate 
system so that the first p axes lie in the flat space L,. Denote the coordinates 
of P in the new system by ui, «+: , Un42, those of Q by v;, «+ , Un42, and those 
of R; by th, +++, tinge (i = 1,-°+,p). Let S’ be the surface defined by 


(54) 


~ 1,42 ,,/2 at 8 
(55) Sug = Xvg =1 and Tugrs = 0. 


It is clear that the surface S’ is identical with the surface S defined in (52). It 
is furthermore clear that if the joint density function of wm, +--+ , Unde, U1, °°° 


, 


—* . . . / / / / 
the joint density function of 1, -+-* , Un4o, U1, °°° , Un4e 


dS 


li 


Unz2 IS given by 
/ 
A 
C 
/ 


. , / , e,e el eye . 
of values uw, , -+* 5 Ung, U1, °°" » Unae the conditional joint probability density 


is the same, i.e. it is given by It can readily be seen that for any given set 


. . , e . . ° . . ° 
of the variates ¢;3 is given by the function obtained from (42) by substituting 








158 ABRAHAM WALD 


bea tor te : Us for us and vs for vs , provided that for any given set of 
values 1, °°, Unge, U1, °°* ,Ung2 the joint conditional distribution of the 
variates tig is given by (42). Hence, if the joint distribution of tw, --- , Unys, 
1, °°: , Ung2 and tg (¢ = 1,---, p38 = 1,---,n + 2) is given by the product 
of (54) and (42), the joint probability density function of the variates uj , Up , 
ks (@@ = 1,---,p;8 = 1,--:.n + 2) is obtained from that of ug, vs, tis by 
substituting S’ for S and tis for tis . 

According to Lemma 4’, V can be expressed as a function of the angles 6, , 
6.;and @; defined in Lemma 4’. Each angle 6, (A = 1, 2, 3) can be expressed as 
a function of the variables tig , ug , vg. It is obvious that the value of 6; remains 
unchanged if we substitute tis for tis, us for us and vs for vg. Hence also the 
value of V remains unchanged if we substitute lie for tig, Us for ug and vs for vg. 
Lemma 6 is a consequence of this fact and of the fact that the joint probability 
density of the variates t'3, us and vs is identical with that of the variates tis , 
ug and vz. 


LemMa 7. Assuming that the joint probability distribution of the variates 
Ug, vg, tig (@ = 1,---, p38 = 1,---,n + 2) ts given by the product of (54) and 
(42), the conditional joint probability distribution of u., +++ , Unaos U5 °** 5 Unae, 
calculated under the restriction that the points R; = (tir, +--+ , tina) @ = 1, --+ , p) 
lie in the flat space determined by the first p coordinate axes, is given by 





1 ah & ss 
—= a (piuytf ivy) 
e@ 7 rept i=1 f(r, +++, Unse, U1, °°* 4 V,22) dS 
(56) n+2 iss p ’ 
— FF Zz (pguytt gry)? 
9 _ se" e y 
[e “ y=pt1 i=l flu, 29° Unsey Vig °°", 0,42) dS 
S 
where S denotes the surface defined in (52), and f(t, +--+ , Unga, U1 °° 5 Unga) 
denotes the expected value of 
n+2—p 
m1 °° * Fig| 3 


Toy ° °° Tan| Pp 
(57) . r (>: = Zz. hat) 
: . a=1 


Tp1 °°° Typ } 
calculated under the assumption that the joint distribution of the variates tig is given 
by (42). 

Proor. Denote by R; the projection of R; on the flat space determined by 
the first p coordinate axes, i.e R; = (ta, ---, tip, 0,--- , 0). Let lL be the 
length of R, , and let 1; be the distance of R; from the flat space determined by 
the vectors OR. --- . OR: (i = 2,---,p). Then, as is known, 


i. oe 
(58) hh---L= -_ @@ = 1, ---, D); 


. 
where rx = > liatia - 


a=! 


Co DR ~ 


eo 
——_ 


en 


by 
he 
by 


p), 


A STATISTICAL PROBLEM 159 
We introduce the new variables 


ty . ! 
(59) ti, = — @=1,---,py=ptl,---,n+ 2). 
Then the joint probability density function of the variates ug, vs, tia, 7a 


G@=1,---,pB=1,---,n+2a=—1,---,py=pti,---,n+t+ 2) 
is given by 


, +2 
j n+r2—p . $ ~ ' o. VS S > l ; i. Se 
(l; sith L,) ; —~—sl ast at \bfaPita—Ssital*t Z a CS itiy—piuy—Siry)? 
/ Me ek gee : = 


—eESE7 


x (II IT dtia) 1] II dti,) dS. 


Substituting zero for ic. @Q=1,:--,py=pt+i1,---,n + 2) in (60), we 
obtain an expression which is proportional to the conditional joint probability 
density of the variates us, vs, lia (8 = l,---,n +2;7 = 1,---,p,a = 1, 

, Pp), calculated under the restriction that the points R; (¢ = 1, ---, p) 
fall in the flat space determined by the first p coordinate axes. Hence this 
conditional density function is given by 


l n+2 p 
=3 > = (pguytf gry)? la sail 
Ae “ynrtt int (L, ls oes l,) 


(61) = 
x LE, Zometoe| og II II atic 


where A denotes a constant. The conditional distribution of the variates 
U3, U3 (8B = 1, ---,n + 2) is obtained from (61) by integrating it with respect 
to the variables f;. (¢ = 1,---,pia=1,---,p). Because of (58), we see that 
the resulting formula is identical with (56). Hence Lemma 7 is proved. 


Lemma 8 Let m = wi +--+: + uns m = vi +.--- + 05, and 
ms = Uli + - + Upp. Tf the joint distribution of the variates uy, +++ . Unge, 
Yi, *7* ys Un4o ts given by (54), then the joint distribution of m, , m2, m3 is given by 

B ; ms ) 
/ I pom )F, (72), fk lah all — m) 
VY mmol — m)(L — me) V/ mm 


(62) 
— m3 


X Fisep(1 — me)®,,.2-p oa Yam dmes dmg 
V (1 — m)(1 — me) 


where B denotes a constant, 


PROOF. Let mM, = Mya +r: + Wy, 2, M = v, , abs coe ahs 











160 ABRAHAM WALD 


, Ms ms 2 
M3 = UpawWpai t's + Unsoase, M3 = and Ms = . First w 
P+1" p+ +2Un+2 5 Vm ms ~ / mim , e 
, 
calculate the joint distribution of m, , me , Ms mi , Me, m3 under the assumption 
that w,-+: , Uny2, 01, °°* , Uns are normally independently distributed with 
zero means and unit variances. This joint distribution is given by 
7 7 ia ? a ’ 
(64) F,(m) Fp(ma)®, (73) Fn 2 p (m1) FP n42_p(me) 
) 
— / , 
X Prp2p(%sz) dm dm2 dims dm; dmz din;. 


* e . . . , ; P. « . 
Hence the joint distribution of m: , mz, ms, mi, mz, m3 is given by 








: ms ’ / 
: F,(m)F ,(me)® (= = =) Pyso—p(™1) Fn 42-p(me) 
a m, Me mi ms , ” Vm me . wm 
(65) : 
x = ,) a 1s dims dm’, dm’, dm 
n-3— = m, aM. am m\ ms m 
nm J min ms ee 


The required conditional distribution of m:, mz, m3 is equal to the conditional 
distribution of m,, m2, ms; obtained from the joint distribution (65) under the 
restrictions m; + m; = 1, m2 + m; = 1 and ms + m; = 0. Hence if in (65) 
we substitute 1 — m, for m,, 1 — m2 for m; and —mz; for m3 we obtain an ex- 
pression proportional to the conditional distribution of m; , m2, m3. This proves 
Lemma 8. 

LeMMA 9. For any point (u1, +++ , Ung2, U1, °°* , Ung) Of the surface S defined 
in (52) the expected value of (57) (calculated under the assumption that (42) is the 
joint distribution of tig) is a function of my mz, and m3; only, where m, , m2 and ms 
are defined in Lemma 8. 


Proor. Let || Aas || (a, 8 = 1,---,p) be an orthogonal matrix such that 
(66) r =~ (6 =1 ) 
8) B = > a = H@'Gi yg 
Vu? + coo > Mf, P 
and 
- Us + Xvg 
(67 Aes = : . (6 = 1,---, p) 
Dd (us + dog)” 
p=1 
where 


-> Us 


ee a 


Pp 


iP Ug Ug 
1 


Let 


(68) tia = 2, raptis (a 


B=1 


— 
“ 
. 


++, p). 


The 
var’ 
(66 


mM 


Mm 
th 
(t 


cd 
th 


al 





A STATISTICAL PROBLEM 161 


. , : ° . . ° 
Then the variates ¢;. are independently and normally distributed with unit 
variances. Since for any point of S, E(tia) = pitta + fWa, we have because of 


(66), (67) and (68) 
E(tiy) = 0 @ = 1,---,p, y = 3,4, -*+, D); 
E(t) = gan(mi, me, ms), 
and E(t) = gi2(mi, me, M3). 
Hence the joint distribution of the variates tj. (i = 1,---,pja = 1,---,p) 


Dp Pp 
. , / 
depends merely on m,, m: and ms. Since 7;; = } ® tiatja = 7 tiatja, the ex- 
a=1 a=1 
r = ‘ i > 7 , 
pression (57) can be expressed as a function of the variables ¢;.. Hence the 
distribution of the expression (57) depends merely on the parameters m,, me, 
and m;. This proves Lemma 9. 
The main result of this section is the following 
THEorEM. Let:V be the statistic given in (25) and let the joint distribution of the 
variates tig (0 = 1,---,p;8 = 1,--:,n + 2) be given by (34). Then the prob- 
ability distribution of V is the same as the distribution of 
ms 


2) ~~" mi — (1 — m)(1 — me) 


where the joint distribution of m, , mz and mz; is equal to a constant multiple of the 
product of the following three factors: the expression (62), the exponential 


aca y 
gimde? +2m3 2ipifst+mel$?) 


and the expected value of 


p 
@ => 2d tia 7) ° 


The expected value of (70) is calculated under the assumption that the variates tia 
are normally and independently distributed with unit variances and E(tia) = pita + 


‘ Pp p Pp 
fva(i=1,--+,pja=1,---, p) where >, uz = m, dD, vz = mand >, vate = 
a=1 


asl a=1 


|" -+* Tip (n+2—p) /2 





(70) | 


"++" Ton 


m;. The domain of the variables m, , m2 and mz is given by the inequalities: 0 < 
m <1;0 < m < 1; —V mm: < ms < VY mm « 

Proor. First we note that the expected value of (70) is a function of m,, 
m,and m3 only. Let P be the point with the coordinates % , ---+ , Uase, and Q 
the point with the coordinates 1, --- ,Un4e. Assume that the points R; = 
(ta, --+ , linge) @ = 1,---, p) lie in the flat space determined by the first p 
coordinate axes. Assume furthermore that ww; + --- + Unsodnge = 0 and that 
the lengths of the vectors OP and 0Q are equal to 1. Then 


cos0: = Vui +--+; coomh=Vit--- +e, 
and 
6 ‘ Ur Tose + UpYp 
een Vite tut te 


Pp 





162 ABRAHAM WALD 


where 6; denotes the angle between OP and the flat space L, determined by the 
vectors OR; , --- , OR, ; 6 denotes the angle between 0Q and L, , and 63 denotes 
the angle between the projections of OP and 0Q on L,. According to Lemma 
. 4’ the statistic V defined in (41) is equal to 

i _ cos 6; Cos 8 cos 63 
(71) cos? 6; cos? 62 cos? 63 — sin? 4; sin? 4 


Nz 


— m; — (1 — m1) (1 “ Me) 


where 


” 2 2 2 2 2 
(72) m = cos A =u+-:--+u;, Mz = COS %=vit---+0, 


and m3; = Cos 6; cos 4: cos 03 = Ui + -+* + Updp. 


It follows from Lemmas 5 and 6 that the distribution of V is the same as the con- 

ditional distribution of V calculated under the assumption that the unconditional 

joint probability density of the variates ug , vg and tig is given by the product of 

(54) and (42) and under the restriction that the points R; (¢ = 1, --- , p) fall 

in the flat space determined by the first p coordinate axes. Since 
a Sf 


—5 DZ (ojuyt+firy)? 


e “y=pti int is a constant multiple of 


(73) chimp} +2msz pits +m2t2) 


from Lemmas 7, 8 and 9 it follows readily that the joint conditional distribution 
of m = ui +--+) + u,, , Mm. = vi tees + v> and m3 = Udy + ++: + Uplpis 
equal to a constant multiple of the product of (62), (73) and the expected value 
of 70. This proves our theorem. 


‘ ; a 
It can be shown that the variates m,, m2 and mz; are of the order — in the 
n 


probability sense. Hence 


ms 


m3; — (1 — m)(1 — m) = nm(1 + €) 


(74) —n 


s 1 
where «¢ is of the order—. Hence we can say: even for moderately large n the dis- 
n 


tribution of the statistic V is well approximated by the distribution of nms, where 
the joint distribution of m,, mz and mz; is equal to a constant multiple of the produc 
of (62), (73) and the expected value of (70). 

If n + 2 — pis an even integer, the expected value of (70) is obviously an 
elementary function of m, mz. and m;. Hence, if n + 2 — p is even, the 
joint distribution of m; , mz and m; is also an elementary function of m, , m, and 
M3. 

If the constants p; and ¢; (¢ = 1, ---, p) in formula (34) are equal to zero, 
the expected value of (70) is a constant and the joint distribution of m; , m2 and 
m; is given by (62). 





ASYMPTOTIC DISTRIBUTION OF RUNS UP AND DOWN’ 


By J. Wo.row11z 


Columbia University 


1. Introduction. Let a,, a2, ---, a, be any nm unequal numbers and let 
S = (1, he, +--+: , hn) be a random permutation of them, with each permutation 


having the same probability, which is therefore nt Let R be the sequence of 


signs (+ or —) of the differences his: — hi (¢ = 1, 2,---,n — 1). Then R 
is also a chance variable. A sequence of p successive + (—) signs not imme- 
diately preceded or followed by a + (—) sign is called a run up (down) of length p. 
The term ‘“‘run’” applies to both runs up and runs down. As an example, if 
S = (4623 5), then in R = (+ — + +) there are three runs, one up of length 
one, one down of length one, and one up of length two. 

The purpose of this paper is to establish several theorems about the limiting 
distributions of a class of functions of runs up and down. These results are 
applicable to certain techniques which have been employed in quality control 
and the analysis of economic time series. They are also shown to apply to a 
large class of “‘runs.” 


2. Joint distribution of runs of several lengths. Let 7, be the number of runs 
of length p in R and r, the number of runs of length p or more in R. Then r, 


and r, are chance variables. The expectations E(r,) and Z (r’,), the variances 
o (rp) and o(ry), and the covariances a(rp,7p,) are given by Levene and Wolfo- 
witz [1]. They are all of the order n. Let 


_ tp — E(ry) 
Vn 

1 _ ty — E(ry) 
Yp a . 
Our first results are embodied in the following theorem: 

THEOREM 1. Let | be any non-negative integer. The joint distribution of 
Wis y Yrs Yours approaches the normal distribution as n > @. 

We shall give the proof for the case 1 = 1, but it will easily be seen to be 
perfectly general. 

Let x»: = 1 if the sign (+ or —) of hig: — hj is the initial sign of a run of 
length p, and let x»; = 0 otherwise. Let w,; = 1 if the sign of hi; — A; is the 


1 Part of the results of this paper was presented to the Institute of Mathematical Statis- 
tics and the American Mathematical Society at their joint meeting in New Brunswick, 
N. J., on September 13, 1943. 


163 





164 J. WOLFOWITZ 


initial sign of a run of Jength p or more, and let wp; = 0 otherwise. Let z,, = 
Wo, = 0. Then 


n 
_— 2, tu, 


/ 
To 


Now write a = n', 8 = n’, and consider the 8 sequences 
he j-ya+t ’ hy 5-1) a 42 [ne hia j = 1, 2, we B). 


(Strictly speaking, we should employ the largest integer in a. Since what is 
meant is clear and since we are dealing with an asymptotic property, we shall 
omit this useless nicety.) Let L pi and w,; have the same definitions relative to 
each of these sequences that x,; and w,; have relative to the sequence S. The 
accented and unaccented x’s and w’s are not always the same, because the 
partitioning of the sequence S sometimes breaks up runs and creates others. 
Thus we might have zp. = 1, but z,_ always = 0. 
It is easy to see that there exists a positive number d such that 


DE — 2 | < dp, 


t=1 


S| os — whe < dp. 


If, therefore, we define 
b> [xis = E(x::)] 


i=1 


Wn 
> [wos — E(wy;)] 


, eer ; 
we have 
ontudcil 2 dp 
Jn n 
’ wl< 2 dp 





RUNS UP AND DOWN 165 


Hence, if the joint limiting distribution of z; and z: is normal, so is that of 
/ 
and Y2.- 
The chance variables 
ja 


ty = » ti 
i=(j—l)at+l 
ja 
194 = >» Wai Cj — 1, 2, -++, B) 
i=(j—l)a+1 
have the same joint distribution for all values of 7. For vi; and we; »(g-la< 
i < ja), depend only on the relative magnitude of the elements of the sequence 


hej—-na+1 ee hija ? 


not upon the particular values which the elements take, and all permutations 
of the sequence have equal probability. Clearly 7,; and rz; are independent, in 
the probability sense, of ri; and r2; (j ¥ j’), because of the definitions of 2}; 
and ws;. (However, ri; and r2; are not independent, because x; and w2; cannot 
both be 1.) From the results of [1] it follows that for sufficiently large n the 
absolute value of the correlation coefficient between 7; and 7; is less than a 
number smaller than 1. By the methods of [1] it can easily be shown that the 
ratio of the fourth order moments of 7;; and rs ; about their means to the square 
of the variance of either, is bounded for sufficiently large n. Hence by Lia- 
pounoff’s theorem (see, for example, Cramer [2], Uspensky [5]), 2: and z2 are 
jointly normally distributed in the limit. Hence so are y; and y2 and the theorem 
is proved. 


3. Generalization of Theorem 1. Examination of the proof of Theorem 1 
shows that it rests on the following two properties of runs up and down: 

a) Partition of the sequence S into subsequences affects at most d runs in 
each sub-sequence, where d is a fixed positive number independent of n. 

b) After partition the totals of runs of each length in any sub-sequence (the 
definition now relates to the subsequence) are independent in the probability 
sense of the totals of runs in any other subsequence, and satisfy some condition 
(such as the Liapounoff) sufficient to make the components of the sum of the 
vectors jointly normally distributed in the limit. 

Hence if we adopt other definitions of runs which meet conditions (a) and (b) 
above, the total numbers of each of these various kinds of runs will be in general 
jointly asymptotically normally distributed. For example, if s, and s, be the 
numbers of runs up of length p and of length p or more, respectively, and if 
tp and t,, are the same quantities referring to runs down, then, with | and k any 
positive integers, 


, 
8,8 ,°°° ,&, S(141) » bt, °**,t 





166 J. WOLFOWITZ 


a . . ° om . . 
are jointly asymptotically normally distributed. However, if (41) is included 
in this set, since 


/ 
= 3 +t 8S fees + St + Sas 


and 
6 =t+bht+e::: + te + tes1 


differ by at most one, the limiting distribution is degenerate, i.e., its covariance 
matrix is only semi-definite. 

As another example, if we define a bizarre run as, say, the occurrence of a run 
up of length 5, followed, 17 elements later, by a run down of length i4, then the 
number of runs of this type is asymptotically normally distrivuted with expecta- 
tion and variance of order n. 


4. Additive functions of runs of all lengths. Combining the numbers of runs 
of all lengths greater than a given length generally involves a loss of information. 
The following theorem on additive functions of runs up and down may be of 
general interest and of utility in avoiding this undesirable situation. 

THEOREM 2. Let f(t) be a function, defined for all positive integral values of i, 
which fulfills the following conditions: 

a) There exists a pair of positive integers, a and b, such that 


f(a) 
(4.1) es 
(0) 
b) for any «: > 0 there exists a positive integer N(e€,) such that, for all n > N(«), 


i=n—l 


(4.2) Z 1F@ alr) < an 


1=N (€1 


a 
“5 


where n, of course, has the same meaning as in the preceding sections. Let F(S); 
a function of the chance sequence S, be defined as follows: 
(n—1) 


(4.3) F(S) = d f(i)r;. 


F(S) — E[F(S)] 
olF(S)] 


Then the distribution of approaches the normal distribution as 


n> x. 

As an example, let f(¢) = 1. Then F(S) : rt, Whose limiting distribution 
is normal by Theorem 1. 

This theorem is the exact analogue of Theorem 2 of [3] and the proof of the 
latter carries over without difficult changes except in one important respect. A 
difficulty in the proof of the theorem in [3] lay in proving Lemma 4, and this 
lemma has to be proved completely anew. We shall limit ourselves here to doing 
just that. Lemma 2 of Theorem 2 of [3], whose only role was to help in proving 
Lemma 4, has no analogue in our present problem, but all the others do. It 
will therefore be sufficient if we prove the following: 





RUNS UP AND DOWN 167 


LemMA. There exists a constant c > 0, such that, for all n sufficiently large, 
(4.4) o [F(S)] > en. 


Condition (a) of the theorem is imposed simply in order that the result be 
not trivial. For, if (a) does not hold, we have that 


1M =%7(Q), 
and 
f(1) Lir; 
= (n — 1)f(1) = a constant. 
Suppose that 
f@) =wi ty, 
with u and v constants, and v ~ 0. Then by Theorem 1 
F(S) = un — 1) + om 
= vr; + a constant 


‘ is asymptotically normally distributed with variance of order n. Without loss 
of generality we may therefore assume that 


(4.5) fa) Awit+n. 
From this it follows that there exists an integer A > 2 such that 


(4.6) f(A — 1) + fA + 1) # f(A). 


Our object is to prove that o[F(S)] is at least of order n. The basic idea of 
the proof will be to construct two sets, say ZL; and Le , of sequences S, such that 
the (same) probability of each is not less than a positive lower bound independent 
of n, and such that there exists a one-to-one correspondence between the se- 
quences of L, and those of Le. so that, if S; is a member of LZ; and S, the cor- 
responding sequence in Lz, 


|F(S:) — F(S:)| => gVn, 


where g is a positive constant independent of n. It is easy to see that such a 
construction would prove the lemma. 

We shall call the subsequence (h;, hisi1, +++ , Aigea) of S, a run of type 7; 
or simply a run 7; (the notion will be used only for the proof of this lemma) 
if the following conditions are fulfilled: 

(4.7) each of the signs of (his: — hi) and (hisa41 — hia) is the initial sign of a 
run of length A. 
if 2 ~ 1, the sign of (h; — Ai_1) is not the final sign of a run of length A. 
if7 + 24 # n, the sign of (hisoa41 — hisea) is not the initial sign of a 
run of length A. 





168 J. WOLFOWITZ 


(4.10) after the transformation H, which interchanges hj+4-1 and his, , has 
operated on the run, the sign of (Ai4: — h;) is the initial sign of a run of 
length A — 1, and the sign of (hi+4-1 — Aiz.4), in the new ordering, is the 
initial sign of a run of length A + 1. 

Thus, with A = 2 and n = 7, if S = (7145326), then R = (— + + — — 4), 

and (1 453 2) isa run 7), for after the transformation H has been applied we 

have (1 5 4 3 2) which gives (+ — — —). The result of the operation H ona 

run 7’ will be called a run T,. 

The number 7* of runs 7; and the number 7** of runs 7 each have expected 
values and variances of order n, by considerations similar to those of [1]. Hence, 
for an arbitrarily small positive ¢ there exists a positive constant q such that, for 
all n sufficiently large, the probability P {r* + r** > qn} of the set L* of sequences 
S which satisfy the relation in bre ‘vs, is not less than 1 — e. 

The set L* can be divided into disjunct sets (families) as follows: Let S(0) 
be any sequence S in L* which has no runs 7? (any doubt about the existence 
of such sequences will be soon removed) and let r*[S(O)] = m. Hence m > qn. 
Operating with the transformation H on each of the m runs 7; of S(O) we get a 
set S(1) of m different sequences for each of which r* = m — 1, r** = 1. Operat- 
ing again with H on each of the pairs of runs 7; of the sequence S(O) we get a: 
set S(2) of (5°) distinct sequences for each of which r* = m — 2, r** = 2, ete. 
The process stops with S(m), which contains a single sequence, for which r* = 0, 
r** = m. The set S(z) contains (7) different sequences for each of which r* = 
m — i, r** = 7. The union of the sets S(z) (¢ = 1, 2, --- , m) will be called the 
family whose generator is S(O). The sets S(z) are obviously disjunct. Any 
sequence S in L* belongs to one and only one family. For if we operate on all 
of its runs 7; with H (which is its own inverse), we obtain the generator of the 
family to which it belongs. This also proves the existence of sequences in L* 
for which r** = 0. 

Consider any family F whose generator is a sequence for which r* = m > qn. 
It is easy to see that, when n is sufficiently large, the ratio of the total number of 
sequences S in the sets Li and L> , where 

i=} (m—V/m) 
Lr= > S(i), 


i=0 


L? »  S(i), 
i=} (m+./m) 
to the total number of sequences in F is greater than a fixed positive constant K’. 
We are now ready to construct L; and L.. The set L; is the union of the sets 
Ly of all the families in L*, and the set L, is the union of the sets Lz of all the 
families in L*. The probability of LZ; and of Le is therefore not less 
than 3K’(1 — e). The one-to-one correspondence is effected as follows: The 


subset 3(” _ > _ i) of the set Li of any family is to correspond to the 





RUNS UP AND DOWN 169 


2 2 2 
family. The individual sequences of either of the two subsets may be made 
to correspond to those of the other in any manner whatsoever. Any sequence 
§, in L,; and its corresponding sequence S, in L, thus differ only in the numbers 
of runs 7’; and 7., but are identical in the numbers of all other runs. They 
differ in at least ~/m runs. Hence, 


| F(Si) — F(S2) | > Vm | 2f(A) — f(A — 1) — f(A + 1) | 
= Vn | 2f(A) — f(A — 1) - f(A + 1]. 
This is the required result with 
g= V¢q|2f(A) - f(A - 1) -f(A+1)|. 


Hence the lemma and the theorem are proved. 
The remarks of section 3 also apply to Theorem 2. 


subset 3 oo = + i) (i = 0,1,2,---, te v) of the set L: of the same 


5. The distribution of long runs. Certain tests in use in quality control of 
manufactured products are based on the occurrence of long runs. Since the 
mean and variance of r, , for any fixed p, are of order n, it follows that the prob- 
ability that r, ~ 0 approaches 1 (with increasing n). In order to base a test 
on the occurrence of a run of length p in long sequences it is therefore necessary 
to make p a function of n. This function must be a suitable one, because if p 
is, for example, of the order n, the probability that r, = 0 approaches 1; p 
should, therefore, be neither too short nor too long. 

The following theorem will help give the answer to this problem: 

THEOREM 3. Let p vary with n, so that 


(pt+i)!_ 1 
n K 


with K a fixed positive number. Then 


g* (2K)’ 


n=e 7! 


lim P{r, = j} = (j = 0, 1, 2, +++) 
i.e., Tp has in the limit the Poisson distribution with mean 2K. 

The proof will consist in showing that the moments of r, approach the moments 
of a Poisson distribution with mean 2K asn— «. This is sufficient (v. Mises 
(4]). 

Let x; = 1 if the sign of hi4,; — h; is the initial sign of a run of length p, and 
zt; = 0 otherwise. The probability that x; = 1 is, by [1], Section [4], 
ap + 3p + 1) 

(p + 3)! 


then 


for alli with a fixed number of exceptions.” Write B = 


(p + 1)!’ 


P{x; = 1} = B + o(B), 


? Since these exceptions (at the ends of the sequence S) have no effect on the asymptotic 
theory, they will henceforth be ignored. 








170 J. WOLFOWITZ 










where the symbol o(B) means that lim = = 0. Let y; @ = 1, 2,---,n) be 


independent chance variables with the same distribution: P{y; = 1} = B 
{y; = 0} =1—B. Then itis easy to see that Y = >> y; has in the limit the 
i=1 


Poisson distribution with mean 2K and that its moments approach the moments 
of the same Poisson distribution. Hence it will be sufficient to show that in the 
limit Y and r, have the same moments. 


If g, a1, 02,°°*,a,andi, < i < +--+ < i, are positive integers, we have that 


E(yi} Yio 197 e yi) = E(Yi, Ys ar Yiq) 





(5.1) q 
= I] E(y:;) _ Bt 


and 


(5.2) 0 < E(zi} ai} --- aif) = E(x, 2, +>+ %,). 















Also 
(5.3) E(r,) = e|& x |. 


After expansion of the right member of (5.3), we may replace, in accord with 
(5.2), each of the non-zero exponents of the z’s by 1. The same operation on the 
terms of the expansion of the right member of 


(5.4) Br) = £[ Dw, 


is valid in accord with (5.1). 
Let 41 < % <--- <%,. In the expression 


(5.5) E(2;,2i, °** ig), 








let q be the ‘‘weight.’”” A subsequence of consecutive 2z’s in (5.5) (it may consist 
of a single x) which is sich that the indices of two consecutive x’s differ by less 
than (p + 3), while the subsequence cannot be expanded on either side without 
violating this requirement, will be called a ‘‘cycle.’’ Let c be the number of 
cycles in (5.5). By [1], Section 4, if x; and 2; are in different cycles, 
ie., |2 — j| > (p + 3), then 2; and 2; are independently distributed. If, 
therefore, g = c, we have that 


(5.6) E(x;,2i, «++ xi,) = I] E(a:,) = B* + o(B%). 
7=1 





If gq > c = 1, we have, also from [1], Section 4, that 








(5.7) E(xi,%i, +++ Lig) S E(xi,xi,) = o(B). 








ror. 


RUNS UP AND DOWN 171 


If q > c and if there are two indices in the expression (5.5) which differ by less 
than p, then 


(5.8) . E(xi,%i, we i.) = 0. 
For x; and x; cannot both initiate runs of length p if |7 — j| < p. 
Let us now return to the expansions of the right members of (5.3) and (5.4), 


in which the exponents have been replaced as described before. Let the weight 
and cycle definitions also apply to terms of the type 


(5.9) E(yiYis *** Yin) 


From (5.1) and (5.6) it follows that, in the limit, the contributions to E(r‘,) 
and E(Y') of the sums of those terms for which q = c,arethesame. Let W and 
W’ be the sums of all the remaining terms in E(r/,) and E(Y’), respectively. If 
we can show that 


(5.10) lim W = lim W’ = 0 
we will have proven that 
(5.11) lim E(r',) = lim E(Y’) 


and with it the theorem. 

Let B = O[f(n)] mean, as usual, that | B | < Mf(n) for all n and a fixed M > 0. 
The number of terms in W’ with fixed g and c (c < q, by definition of W’) is 
O(n‘p* °). From (5.1) the value of the sum of all such terms is O(B*%n‘p*‘). 
Now 


nB = O(1) 


by the hypothesis of the theorem. From the definition of p, 


p = 0(n) 
and hence 
pB = o(1). 


Therefore 
B*n‘p** = (nB)‘(pB)*~“ 
o(1). 


Since q < l, there are only a fixed number of such sums. Hence lim W’ = 0. 
The number of terms in W with fixed g and c (c < q) is O(n°p* *). However, 

most of these are of the type in (5.8) and therefore vanish. Those which do not 

vanish are O(n°) in number. Since g > c we have by application of (5.7) that 

each term is 0(B‘). Hence the value of the sum of these terms is o(n°B°) = 

o(1). Since q < l, there are a fixed number of such sums. Hence lim W = 0. 
This proves (5.10) and with it the theorem. 





172 J. WOLFOWITZ 


It is possible to generalize this result in a manner similar to that of Section 3. 
The author is obliged to W. Allen Wallis who first drew his attention to prob- 
lems in runs up and down, and to Howard Levene, who read the manuscript of 


this paper. 


REFERENCES 


{1] H. LEVENE and J. Wo.rowitz, Annals of Math. Stat., Vol. 15 (1944). 
[2] Haratp Cramér, Random Variables and Probability Distributions, Cambridge, 1937, - 
[3] J. WoLtrowitTz, Annals of Math. Stat., Vol. 13 (1942), p. 247. 

[4] R. v. Misss, /ettschrift fuer die angewandte Math. und Mechanik, Vol. 1 (1921), p. 298. 
[5] J. V. Uspensxy, Introduction to Mathematical Probability, New York, 1937. 





STATISTICAL ANALYSIS OF CERTAIN TYPES OF RANDOM 
FUNCTIONS 
By H. Hurwirz, Jr. anp M. Kac 
Cornell University 
1. Introduction. In solving certain physical problems (Brownian movements, 


shot effect) one is often led to the study of superpositions of random pulses. 
More precisely, one is led to sums of the type 


(1) F(t) = 2 ft — 4), 


where N and the ¢,’s are random variables and a function P(¢) is given such that 


/ P(t) dt represents the average number of pulses occurring during the time 
A 


interval A. 

We propose to give a fairly detailed treatment of those statistical properties 
of F(t) which may be of interest to a physicist and at the same time pay careful 
attention to the mathematical assumptions which underly the applications. It 
may also be pointed out that our results could be applied to the theory of time 
series. 


2. Statistical assumptions and the distribution of N. The statistical assump- 

tions can be formulated as follows: 

1. The ¢,s form an infinite sequence of independent identically distributed 

random variables each having p(t) as its probability density. 

2. N is capable of assuming the values 0, 1, 2,3, --- only, and N is independent 
of the ¢,s. 

3. If M(A; N) denotes the number of those ¢;’s among the first N, which fall 
within the interval A, then for non-overlaping intervals A; and A: the ran- 
dom variables M(A,; ; N) and M(A: ; N) are independent. 

We now state our first theorem.’ 

THEOREM 1. Assumptions 1, 2, 3 imply that N is distributed according to 

Poisson’s law, 1.e. 


nll h’ 


rl? 


Prob {N = r} = e 
eo 
where h = [ P(t) dt. 


1For a different approach to Poisson’s distribution see W. Fetter, Math. Ann. 113 
(1937) in particular pp. 113-160. 


173 





174 H. HURWITZ, JR. AND M. KAC 


Our proof is based on considerations of characteristic functions. Let y4(z) 
be 1 if x belongs to the interval A and 0 otherwise. Thus 


M(4;N) = > Walt). 


From the independence of (A; ; N) and M(A, ; N) it follows that for every 
pair of real numbers £ and 7 we have’ 


E E ‘i (« z Vail) +7 pe vast) 
N ( N 
= EF E {i a vaxtt) | E | exp \" a Yott, 


where E[x] denotes the mathematical expectation, or mean value, of x. Letting 
q(r) = Prob {N = r} and using first the independence of N and the ¢,’s and then 
the fact that the ¢,’s are independent and identically distributed we obtain 


oo 


LX a(r)(Elexp {i(&s,) + mWa.()}))" 


(2) 
= > a(r)(Elexp {its,(0}1)" 2d q(r)(Elexp [inpa,(O})). 


r=0 


An easy calculation gives 


Blexp {i&¥s,(0}1 = 1+ €*-1) [po at, 
Flexp {iga.()}] = 1+ (" — 1) [ p(t) at, 


Elexp {i(ébs,(0 + nva.(O)}] = 1 + (e* -— 1) [ p(t) dt + (e"" — nf p(t) dt. 
1 2 

The last equation follows from the fact that A; and A, do not overlap. Putting 
é=n=7,2=1- 2| p(t) dt,y=1- 2] p(t) dt, o(x) = Xq(r)2" we see that 
(2) yields the senitinal equation a 

(3) ex + y — 1) = gz)gely). 

One cannot ascertain that (3) holds for all real x and y. First of all the defining 
power series of g(x) is not known to converge outside the unit circle and secondly 


it is not obvious that each pair of real numbers x, y between —1 and 1 is such 
that non-overlapping intervals A; , A: exist for which 


z=1-— 2 | p(t) dt and y=1-— 2 | p(t) dt. 
4i Ae 


2 We use the symbol & and E[R] interchangeably to denote the average (mathematical 
expectation) of R. 





STATISTICAL ANALYSIS OF RANDOM FUNCTIONS 175 


However, if one restricts oneself to small A; and A: the functional equation (3) 
js seen to hold in a sufficiently small neighborhood of 1. This is sufficient (in 
view of the analyticity of ¢ in the unit circle) to determine ¢(z). 
In fact, differentiating (3) first with respect to x and then with respect to y 
we get 
e'(z)e'(y) = oe + y — 2D). 


Letting y = 1 and putting ¢’(1) = h we have 
g(x) = he’ (a), 
which yields immediately 
g(x) = Ae” + B. 


An entirely elementary reasoning (which employs the fact that Ae” + B must 
satisfy (3)) leads to the conclusion that B = 0, 4 = e¢” which in turn implies 
at once that 


Tr 
—arh 


g(r) =e A’ 


Finally, 


I. P(t) dt = E[M(N; A)] = p> vate) | 


= (fp aor Sh = bf re at 


and therefore 
[ P() dt=h, P(t) = hpld). 
Since h is the mean value of N (i.e. N) we shall use N instead of h. 


3. Fourier coefficients of /(/) and their statistical properties. In physical 
applications it is often convenient to assume that the “pulse function” f(£) is 
periodic with period 7'(7 large) and one might therefore restrict oneself to the 
interval (0, 7). 

It is furthermore assumed that both f(t) and P(é) are sufficiently smooth’ so 
as to justify the formal operations on Fourier series performed below. Since 
we work in the interval (0, 7’) we assume that P(t) = 0 fort < O andi > T. 

Expanding f(t) in a Fourier series in (0, 7’) we get 


f(t) ~ Yalu) exp (ont), oe = PE 


* For instance f(t) and P(t) may be assumed to be of bounded variation. Actually, much 
less severe restrictions suffice but in investigations of this sort far reaching generality would 
only impair the exposition. 





176 H. HURWITZ, JR. AND M. KAC 


and thus 


2 


F(t) ~ >> a(ox)b(wx) exp (iwet), 


—2o 


where 
N 
b(we) = a exp (—tu, t;). 
fo 
Note that 


Elexp (—iwt)| = I exp (—iwt)p(t) dt 


; I exp (—iet)P() dt | 
T +o 


and put 


BIF(] = Ze alw)olex) exp (ian!) = TL alwr)pCx) exp (ia 


N 

” dX cos (wt) — Ne(wx) 

xX = apices ——__——, 
k Ui 


N 
2, sin (ws t;) = Ns(wx) 


y”” ee — 
k WN 


4 
Thus remembering that N = [ P(t) dt we may write 
0 


F(t) — F(t) 
V/N 


+00 a a 
~ DY alex)(X? — iYE”) exp (swe t) 


FQ) — F) 
V (0) 
We can now state the following: . . 
TuHEorEM 2. In the limit as N — « each X{” (and Y;{"’) is normally distrib- 
uted with mean 0 and variance 4 + 4c(2Qw,) (4 — 4c(2u,)). 
The proof, as usual, is based on the consideration of the characteristic function 
of Xf’. 


~ VT DY aer)(XE” — iY’) exp (x2). 





STATISTICAL ANALYSIS OF RANDOM FUNCTIONS 
We have 
" (NV) 
Blexp {téXx }] 


exp {—i¢/N c(wx)} exp (—N) ‘ " (z oo c 7 


In deriving this formula use has been made of the facts that the ¢,’s are inde- 
pendent and identically distributed, that N is independent of the #,’s and that 
N is distributed according to Poisson’s law. It is now easy to see that as N — 
the characteristic function of X{*? approaches 


exp {—(3 + 4c(2ue))é} 


uniformly in every finite interval. This, in view of the continuity theorem 
for Fourier-Stieljes transforms, implies our theorem. It should be mentioned 
that it is tacitly assumed that even though N = T(0) approaches ~ it does it 
in such a way that the ratio p(w)/p(0) (and hence c(w)) remains constant (or 
more generally, approaches a limit). 

By considering the characteristic function of the joint distribution of X; m™) 
and X{*( |k| + |2|) (or any other pair like, for instance, X; ® and y™ in 
which case no restriction on k, / is necessary) we are able to prove 

TuEoreM 3. In the limit as N — & the distinct Fourier coefficients of (F(t) — 
E{F (t)])/-V (0) are normally correlated (i.e. their joint distribution function is the 
bivariate normal distribution). 

It is also clear that the higher correlations (i.e. between more than two coef- 
ficients) will lead to multivariate normal distributions with coefficients expressible 
in terms of Fourier coefficients of P(é). 

We do not state Theorem 3 in more definite terms because in the next section 
we shall give a more convenient and useful way of handling correlation proper- 
ties of our Fourier coefficients. 


4. Statistical structure of Fourier coefficients. Let us assume that P(t) > 
y > 0 and that the Fourier series of P(t) converges everywhere. 
Expanding ~/ P(t) in a Fourier series in (0, 7’) we have 


V/P(t) = > a(a) exp (tat), 


and in particular (since p(t) = P(t)/N) 


20 


1 
V/ p(t;) = Vi Xu o(wi) exp (tw t;). 








178 H. HURWITZ, JR. AND M. KAC 


We can now write 


b(w.) = > exp (—tuxt;) = y = V p(t) 














ae eo exp (2(@ — w«)t;) 
WX. {2 V vt) 
oe he exp (iw; t;)) 
TR Ee + eG) | 
= a exp (twt;) _, = 
= 7 D owt asda wt) + 7 a(wit wt; a ay V v(t) ~ Tel ai} 


x x o (wx + we)o(— a) 





t+vVF > owt oes va Pn VT o(-aih}. 


l=—oo 1 






Put o(w) = a(w) + 78), note that by Parseval’s relation 
p(w) = 2 o(w. + wi)o(—w), 


and introduce random variables U‘*’ and V$*” by means of the formulas 








w _ _1 cos (witi) yz, 
-Tmr Vou) ~ V2 aw, 


Mm _ _- a sin (w;t;) ~ SF Ble 
' Tau Vat) ~ VTA 










b(n) = Tow) + VT De owe + «(UE — ivi?) 






and we have the following theorem. 
Turorem 4. In the limit as N — ~ the random variables U, ” _u m™ ve", 
m 7". -++ are independent and normally distributed (each wth mean 0 and 












cua 3). 

This theorem can be proved in a manner exactly analogous to that of Theorem 
2. We need only consider the characteristic functions of the joint distributions 
of U’s and V’s and treat them in the same way as we treated the characteristic 
function of the distribution of a single X in the proof of Theorem 2. One thing, 
however, should be strongly emphasized. The proof of independence (in the 
limit as VW > «) of U{” and US” (|1| ¥ | m| ), for instance, depends on prov- 
ing that 


E[U?’u)| = 0 







and 


Sin 


STATISTICAL ANALYSIS OF RANDOM FUNCTIONS 179 
This in turn depends essentially on the fact that N is distributed according to 


Poisson’s law. 
In fact, 


(VY) 77(N) ; 1 -. ss etd) (§ COS (Wm e 

hi Oo) « a 8 — — — Ta(wi)a(wm). 
et fe Volt) )\2 V7 pte) ) | ~ Pa eda (on) 
But 


= cos (wit) ( [ 7 
E lv a ) “ V pti) 


i el COS wrt COS dm a] ‘ [> COS wt; COS dn «) 


j=l a) 


_ pl VAY) 
ox z| (N° | Te(oaten, 


im V p(t;)V pte) 


and finally 


EUS” u*) = Aa - 1) Tex(wn)or(.m). 


Since for Poisson’s distribution N’ = N + (N)’ we get 
E(US US} = 0. 


Also the proof that E[| Us‘? |*] = 4 employs essentially the fact that N is 
distributed according to Poisson’s law. 

In view of Theorem 4 we can restate Theorem 3 in a form which is both useful 
and illuminating inasmuch as it describes completely the statistical structure 
of the b(w,)’s and hence of the Fourier coefficients of F(t). 

THEeorREM 5. For the purposes of finding correlations between the b(wx)’s it 
suffices to replace each b(wx) (in the limit as N — ~) by its “statistical representa- 
tion’’ 


0 


T p(w) + VT 2 o(w. + w1)Ar, 


where A_; is the complex conjugate of A,, Ao, A1, Az, --- a sequence of independent 
complex-valued random variables and each Aj, 1s distribuied in such a way that 
% = arg A; is untformly distributed independent of A; and the density of the prob- 
ability distribution of | Ax| is 


2he**, (A>O). 


Theorem 5 was proved under the assumption P(t) > y > 0. This assumption 
was needed to validate the convenient artifice of multiplying and dividing by 
V v(t) 


However, even in the case when P(é) is not bounded from below by a positive 












































































































cme ee eee RE EE ESE 














instead of sum (1) we may consider the sum 


180 H. HURWITZ, JR. AND M. KAC 





number (it is always true that P(t) > 0) Theorem 5 remains true. 

proved by direct but tedious considerations suggested in section 2. 
Theorems 4 and 5 can be easily extended to the case when the pulses all have 

the same shape but may, at random, differ in magnitude. In other words, 


It could be 





N 


(4) F(t) = Deft — t), 


j=1 


where the individual pulses are independent and a function P(e, ¢) is given such 


that 
e+Ac t+At 
| / P(e, t) dt de 


is the average number of pulses of “‘amplitude”’ between ¢ and e + Ae occurring 
between ¢ and ¢ + At. 

Theorems 4 and 5 still hold provided one replaces the Fourier coefficients of 
P(t) by those of 


[ «P(e, t) de, 


o 


and the Fourier coefficients of +/ P(t) by those of 


V Q(t) ra /[- € P(e, t) de. 


5. Concluding remarks and summary. If one assumes that the number of 
pulses N in the time interval (0, 7’) is constant instead of being a random variable 
obeying Poisson’s law, then Theorems 4 and 5 fail. The failure is due to the 
fact that, for instance E[U{”’US”] is no longer 0. However, as T — = the 
changes in correlation due to assuming N constant become negligible. On the 
other hand if one assumes that the number of pulses in each of the time intervals 
(0, +), (7, 27), --- is fixed, the changes in correlations become appreciable. 
This case can also be treated by the above methods. 

The case in which p(t) is independent of time has been considered in various 
connections by Schottky, Uhlenbeck and Goudsmidt and Rice*. Their investi- 
gations emphasized the importance and usefulness of the harmonic analysis 
of random functions. 

In conclusion we summarize our results for the case of time-dependent P(e, /) 


4W. Scuorrxy, Ann. d. Phys. 57 (1919) pp. 541-567. 
G. E. UHLENBECK and 8S. Goupsmipt, Phys. Rev. 34 (1929) pp. 145-151. 
S. O. Rice, mimeographed notes on mathematical analysis of random noise, as yet 
unpublished. 
The authors are indebted to Mr. Rice for making his notes available to them. 





and 


Fur 


wh 
the 


wl 


of 


STATISTICAL ANALYSIS OF RANDOM FUNCTIONS 181 


by observing that in applications one may replace F(t) by its “statistical 

representation” 

(5) EF O\+ VT DU ate) o(w + WA exp (tue), 

where 

~~ 
rT ? 


Wi: 


E\F(t)] = 1’ D2 a(we)p(we) exp (iwet), 


—@ 


‘ eP(e, t) de = . p(w) exp (tax 1), 


k=—oo 


V Q(t) = / [ é P(e, t) de = a. o(wx) exp (iwxt), 


and the A,’s are normally distributed complex-valued random variables for which 
E[|AJ =0, EfjAcP)=1, Ar = A+. 
Furthermore, for 1 > 0 the A,’s are statistically independent. 
Thus 
F(t) — ElF()] ~ p> A(wx) Exp (we t) 


where the \’s are normally distributed complex-valued random variables obeying 
the relation 


Ell Mw) = Law) Pf ew au. 


If Q(t) is periodic with frequency Sg then it follows that \(w’) and A(w’’) are 


independent unless w’ + w’’ or w’ — w” is an integral multiple of w,, . 
Finally, we mention that F(t) — E[F(é)] is normally distributed with variance 
s(t) given by the formula 


2 


s(t) = El(F(t) — E(F))) = T 2 (wx) u(wx) exp (tox t), 


where y(w,) is the Fourier coefficient of Q(t) and u(w,) the Fourier coefficient 


of f*(t). 

















RANDOM ALMS 


By Paut R. HAtmos 


Syracuse University 


1. Statement of the problem. Consider the problem of distributing one 
pound of gold dust at random among a countably infinite set of beggars. Let the 
beggars be enumerated and let the procedure for distribution be as follows: 
the first beggar is given a random portion of the gold; the second beggar gets a 
random portion of the remainder; --- and so on ad infinitum. In this deserip- 
tion the phrase “‘random portion”’ occurs an infinite number of times: it seems 
reasonable to require that it have the same interpretation each time. To be 
precise: let 2; (7 = 0, 1, 2, ---) be the amount received by the jth beggar. Let 
the distribution of x) be given by a density function p(A): 


(1) pay20 O08 A681; 
(2) [ vaya =1; 
b 
(3) Pa<m<b)= | para, bee<cee 1. 


After the first beggar has received his alms and the amount of gold dust left is 
b, (i.e. = 1 — yz), the value of 2, will be between 0 and uw. The uniformity 
requirement mentioned above means that the proportion of » that the second 
beggar is to receive is again determined by the probability density p: in other 
words the conditional probability that x, be between Aw and (A + dd)u, given 
that x% = 1 — yu, is p(A) dd. In symbols: 


b 
(4) Mita thelan i ~ eo / p(n) dd. 
Writing a = ay, B = bu, (4) becomes 
"i 
(5) Pla <1 <Blm=1-» =f <p(*)ar 
a Mo 


More generally I shall assume that the conditional probability distribution of 
Xn, assuming that after the preceding donations there is left an amount 4, is 


given in the interval (0, un) by : p (*) . Insymbols: 
wo \b 


b 

1 

(6) Pa<m<blDa=1-)=['eC)a, Osa<bSu. 
i<n a Le 

This assumption completely determines (in terms of p) the joint distribution 

of the whole infinite sequence {xz , 211, %2,---}. Several interesting special 


182 


the 
exa 
all 
beg 
It 1 


RANDOM ALMS 183 


questions may be asked about this distribution. For example: What are the 
expectation, dispersion, and higher moments of the z, ? What, similarly, are 
the moments of the partial sum S, = >> jgn Xj? More generally what are the 
exact distributions of x, and of S, ? Will the process described really distribute 
all the gold, or is there a positive probability that some is left even after every 
beggar had his turn? What is the rate oi convergence of the series > axe t,? 
It is the purpose of this paper to answer these and a few related questions. 


2. Calculation of distributions. The n + 1 dimensional probability density 
of the distribution of (a, 21, °°: ,2%n) is given by’ 


a. awe — 
( iol — Dueck” L — ici di 
in the region defined by A; 2 0, A + --- +A, S$ 1. Forn = 0 there is only 
one term in the product and that one is equal to p(Ao); the region is defined by 
0<X <1. The formula reduces in this case to the definition of the distribution 
of a. The general case follows inductively by the use of the conditional prob- 


ability formula (6). (For example: P(x = Xo, %1 = A) = P(xo = Ao) P(1 = 


7 1 A 
di | vo = do) = p(Ao) on ry (, _ ‘) ) 


From (7) it is possible in principle to calculate the densities of the distributions 
of z, and of S,.. Thus for example the density q, of the distribution of x, 
is found by integrating out the \; with 7 < n from (7), so that 


1 


Ni 
8) Qn(\n) = | -s | I i. Dies * isc Dic ‘J + esis 


where the integration is extended over the region defined by \; 2 0 (0 S j 
Yicndy < 1. Similarly V,(t) = P(S, < 4) is given by 


1 Ai 
© won ff Lizyayes saa) ee 


(0 < ¢ S 1) where the domain of integration is defined by 4; 2 0 (0 Sj S n), 


Do agn Ay < t. 

Working with integrals of the type (8) and (9) is often greatly facilitated by the 
substitution uw; = isi Aj, Qi = wi — win), 09 St Sn. The Jacobian of this 
linear change of variables is identically one. The domain of integration used 
in (9) is defined in terms of the p’s by 0 S wo S mi S +++ S wn St S 1, 80 that 


‘ ” - 1 Mi — Mi-1 
(10) =v) = | dun | dint °° | duo TT — p(#- mt 
0 0 0 isn 1-— Mi-1 1— Mi-1 


1A summation or a product extended over an empty set of indices will, as is customary, 
be interpreted as0 or 1 respectively. Since throughout this paper only non-negative indices 
are considered, whenever the notation indicates a negative index the quantity to which it is 
attached is to be interpreted as 0. 





184 PAUL R. HALMOS 


Hence the density of the distribution of S, is 


t Hn-1 #1 lai > 
v,(t) = | dun | dpn—2 oat [ duo Ul | »(e = 
0 0 0 — wit 7. 


(1 1) i<n 
1 = Mn-1 
1 — Pn-1 . (; = to) : 


For later purposes it is more convenient to set £ = yw, in (11) and to express 
Un(un) as a multiple (and not as an iterated) aicads then 


1 im 
(12) O(n) = | ne | ae # —,»(" a "a t+ hint y 
i<n po i — Pi-1 


where the domain of integration is defined by 0 S wo S m S++: S mis 


kn S 1. The integrals (8) and (12) are explicitly evaluated below for a special 
case. 
It is possible from (8) to find the kth moment M{” of a,, M{” = 


1 
I Ndn(An) dAn. Write 
0 
1 1 
ay. = I \ p(r) dx, b= | (i r)* p(r) dx. 
0 


1 (7 * . . ’ . . . k 6 - 
, Clearly M;"’ is obtained from (8) upon multiplication by \, and integration 
with respect to A, . 


‘ (rn) __ k 1 Aj 
(13) Me = | _ | Xn i 1 — Psi Aj P [. — ; 7 me 


It is advantageous once again to write yu; = 248 \;. The resulting integral 
may be written in the iterated form as follows: 


1 

n 1 

ue = [dw f du ++ | du, II 
Hn-1 ti<n 1 — Pi-1 


*?p (p=). (un — tet)» 
— Mi-1 


(14) 


Consider separately the innermost integral 
1 
an C d n 
J — i P ¢ = *) (un — Mn—1)" pielttinn ° 
Un-1 | Mn-1 l- Mn-1 
Writing \ = (un — Mn-1)/(1 — unr) this becomes 


= [pO = nad ad = an(t = ana), 


Hence 


1 1 
(15) Mi” = ax | duo --: on 1 II 1 p(* mt) 4 ~~ [n—1) 
p» i—1 


n- i<n i- 1 -= Mi-1 





RANDOM ALMS 


The innermost integral this time is 


1 
J’ = | p (= ; Hn-2) (1 ake ule dun ' 
Hn-2 / 


1 — Mn-2 i- Mn-2 
Write X = (un-1 — Mn—-2)/(1 — wn-2); then (1 — pasa) = (1 — A)(1 — wae) and 


y= [ pO = 0 = wea) ad = Bl = aa) 


Hence, finally, 


1 1 
ye = abr | dito eee | dpun—2 II 


Hn-3 t<n—l 


1 Mi — Mi ). 
r r n- 
ar GS (l = Ha-2) 


Observe now that the right member of (16) (except for the factor B,) may be 
obtained from (15) upon replacing n by n — 1. In other words M{” = 6,M{""”. 
Since Mt” = ax, it follows that 


(17) Mj” = ab, =n = 0, 1,-2,--- 


(16) 


1 
. . . k . . 
Instead of calculating similarly the moments | HnUn(un) dun of S, it is more 
0 


convenient to calculate the quantities 


1 
N@ = I i - «Pelec de. 


The moments themselves may be obtained from the N’s by simple combinatorial 
formulas. 
It follows from (12) that 


1 1 1 
1 = 
18 nN =[ a [ @ | dn (qe ss 
(18) k | * M1 . dp ear rea (1 — pp)”. 


n-1 


The innermost integral in (18) is 


1 
oo n— d n 
I” = / Dp (3 # ) (1 ws ita) _ Ghn 
Hn-1 — Mn-1 1 — Mn-1 


Writing X = (un — un—1)/(1 — nt), (1 — wn) becomes (1 — A)(1 — pn-1), so that 


= I p(r)(1 ~ )*(1 a Mn—1)* dy = B(1 — Mn-1)*. 


Consequently 


1 
Ni” [ dug **° ft s ~ sonia (* ae pt), as) k 
(19) — ” “ P pet 1 — win Ll — wis (L = Hn) 
= 6. Ni”, 

















i186 PAUL R. HALMOS 






so that 
(20) N\” oe oo ._ - 0, i 2 oe 


The additivity of the first moment yields an amusing check on (17) and (20), 
Since F(S,) = E(doicn z;) = 2 dee E(x;) (where FE denotes expectation, or 
first moment), it should be true that 1 — Ni" = 7 M{’. In terms of a’s 
and @’s this means 1 — 6f*' = a 2 ice 8; , and this in turn reduces to the trivial 
identity a, = 1 — ~,. hs 

Since 0 < 2 ign X; S 1 with probability 1 for every n, it is clear that the series 

j>0 2; converges with probability 1 toa sum a,0 S$ 2 S 1. Since “(x = 
a3) and since E(x) = 2 tat E(x;), it follows that E(#) = > a8, = a, / 
(1 — B:) = 1. This implies (since 0 S x S 1) that x must be equal to 1 with 
probability 1. In other words it is almost certain that all the gold dust will 
eventually be distributed. 





3. Product representation. Considerable light is shed on some of the above 
computations (and in fact the moment formulas (17) and (20) are proved anew) 
by the following considerations. The principle of equitable treatment enun- 
ciated in the introductory paragraph was subsequently formalized by the condi- 
tional probability relation (6). It may also be formalized by the following 
(equivalent) procedure. Let yo, ¥1, Y2,°°* be a sequence of independent 
chance variables each of whose distributions is given by the probability density p; 
let y, be interpreted as the proportion, of the amount available to the nth beggar, 
that he actually receives. In other words 
(21) % =y,(1 — a Xj), n=0,1,2,---. 
7<n 
The first main problem in this formulation is to express the 2’s in terms of the 
y’s. This is most easily accomplished by an inductive proof of the formula 
(22) 4 =1- I] a-y. 

1s” i=” 
For n = 0, (22) asserts merely that a = yo. The inductive step proceeds as 
follows: 


j 75 2— 


isn <n— j<a= 


m I a-y+i1- I a-w 


<= n— <n-—1 


1— (1 — yn) IL (l—y) =1- I (1 — yj). 


From (22) it follows that 


(23) Ln = Yn II (1 "a yi) 


i<n 








an¢ 


(24 


Th 


res 


If 


RANDOM ALMS 


and 


(24) R,, = a S,, =l1- = I] (1 a Yj). 


7S” json 
The moment formulas (17) and (20) follow immediately from (23) and (24) 
respectively. 
Another very important application of (23) and (24) is the following theorem. 
If the first geometric moment (geometric mean) 


1 
r = exp {E(log [1 — y;])} = exp 1 log (1 — A)p(A) in| 
0 


1 
is different from zero (ic. if | log (1 — A)p(A) dd is finite) then the limits 
0 


lim (2n/yn)" and lim Ri” 
rsa na 
both exist and are both equal to r. 
Since according to (23) and (24), x./yn = R,-1 the two parts of the conclusion 
are seen to be equivalent. For the proof take the logarithm of both sides of (24) 
and divide by n, obtaining 


—— 
(25) log RN" = : Duisn log (1 — y). 


Since, according to the hypotheses stated, the chance variables log (1 — y,) 
are independent and all have the same distribution with a finite expectation, 
the strong law of large numbers applies to the right side of (25) and (aiter 
taking exponentials) yields the desired conclusion. 

The result just obtained may be phrased as follows: with probability 1 2, 
is asymptotically equal to r"y,. This statement shows that in an obvious if 
somewhat crude sense the rate of convergence of >>;~0 2; is that (at least) of a 
geometric series with ratio r. This conclusion is further supported by the 
behavior of RP, , which again is the sort of thing one expects from a geometric 
series. (That is: the nth root of the nth remainder of a geometric series always 
does converge to the common ratio.) As usual, more delicate quantitative 
results concerning the rate of convergence may be obtained by applying to 
(25) not merely the law of large numbers but the law of the iterated logarithm. 

The product representation of x, in formula (23) points the way to a generaliza- 
tion of this theory which may be of some interest. In this generalization x, 
is still defined by (23) and the y’s are still independent, but the distribution of y; 
is given by a density p;, where the p’s need not be equal to each other. In 
terms of random alms this means that the condition of equitable treatment is 
replaced by the following weaker condition: the probability distribution of the 
amount that the jth beggar receives depends only on 7 and on the amount left 
by the preceding beggars, and in particular does not depend on the sizes of the 
alms already distributed. Many of the conclusions obtained under the simpler 





188 PAUL R. HALMOS 


hypotheses carry over to this generalized case with only slight changes. In 
particular the distribution formulas (7), (8), and (12), and the moment formulas 
(17) and (20), are changed only to the extent of acquiring an extra subscript 
due to the difference of the p;. 


4. Applications. (A) The original motivation of the present work was an 
investigation of the notion of a random mass distribution, and the results ob- 
tained may be considered as one possible solution of the problem of defining 
randomness for mass distributions in the special (discrete) case where the entire 
mass is concentrated on the non-negative integers. It would be of great interest 
to extend the results of this note to various continuous cases in which the set of 
integers is replaced by the unit interval, or the entire real line, or n dimensional 
Euclidean space. I intend to study some of these extensions «. another time; 
at the moment I merely mention one implication of this statistical point of view. 

Considering the sequence {2 , 71, %2, -:-} as a system of weights, the integer 
n carrying the weight x, , various questions may be raised concerning properties 
of the discrete mass distributions so obtained. For example: do the moments 
m = Psi nx, exist and, if so, what are their averages and dispersions and, 
more generally, their moments and their distributions? I shall settle here the 
questions concerning existence and expectation. 

The chance variable m, is non-negative and, even it if is infinite with positive 
probability, its expectation is defined by E(m) = Do,20n"E(a,) = 

on My” = , >o n't. Since 0 < p, < 1, the last written series con- 
verges and therefore /(m,) is finite. This implies that m, is finite with prob- 
ability 1. 

(B) It has been observed that the logarithms of the sizes of particles such as 
mineral ‘grains are frequently normally distributed. Kolmogoroff’ has given 
an explanation of this phenomenon; the results of the present paper yield an 
alternative and in some respects simpler explanation. Suppose in fact that the 
probability of a particle losing a chip the proportion of whose size to the size 
of the original particle is between \ and \ + dd is p(A) dA. With this stochastic 
scheme the size of the remaining particle after n chips have been lost is given by 
R,. Since, by (25), log R, is a sum of independent chance variables with the 
same distributions, the Laplace-Liapounoff theorem may be invoked to show 
that the distribution of F, is for large n nearly normal. (It is necessary of course 
to assume here the finiteness of the second geometric moment, or equivalently 


1 
of the integral I log” (1 — A)p(A) dd.) The mean and the variance of each 
0 


summand of log R,, are 


1 1 
a= i log (1 — A)p(A) dA and ¥ = | [log (1-— A) — a} p(a) ad, 
0 0 


2A. N. Kolmogoroff, ‘Ueber das logarithmisch normale Verteilungsgesetz der Dimen- 
sionen der Teilchen bei Zerstueckelung,’’ C. R. (Doklady) Acad. Sci. URSS (N. 8.) Vol. 
31(1941), pp. 99-101. 





RANDOM ALMS 189 


respectively; consequently (by the additivity of the mean and the variance) 
the corresponding parameters of the distribution of log R, (and hence of* the 
approximating normal distribution) are given by (n + 1)a and (n + 1)b’ re- 
spectively. 

(C) A special case of the distributions studied in this paper (namely the case 
of uniform distribution, p(A) = 1) arises in the theory of scattering of neutrons 
by protons of the same mass. According to Bethe’: “In each collision with a 
proton the neutron will lose energy. As long as the neutron is fast compared 
tothe proton, the probability that the neutron energy lies between / and E + dE 
after the collision, is w(H) dE = dE/E,, where EF, is the neutron energy before 
the collision. This means that any value of the final energy of the neutron, 
between 0 and the initial energy Ep , is equally probable.” 

To calculate explicitly the distributions it is most convenient to start from 
(11). If p (with any argument) is replaced by 1 and the terms of the product 
are distributed, each under its own differential, (11) takes the form 


t 


d be Mn-1 1 - My ] 
(26) v,(t) = ee. dn oe duo 
0 1 — pra 0 1 — pn-2 . {f= - 


The value of the iterated integral is easy to obtain: v,(t) = (—1)"(1/n!) 
log" (1 — t). Since v,(d) gives the distribution of the partial sum S, , the distri- 
bution of R, = 1 — S, is given by v,(1 — 4) = (—1)"(1/n!) log" t.4 It is possible 
but not necessary to derive similarly the distribution of x, . It is simpler to 
obtain this distribution by exploiting the symmetry of the uniform distribution. 
Since, according to (23) and (24), x, and F, are both products of n + 1 uniformly 
and independently distributed chance variables they have the same distribution, 
so that the density of the distribution of 2, is also given by (—1)"(i/n!) log” t, 
n= 0,1, 2,---. 

The roles of the geometric mean r (= 1/e in case p = 1) and of the normal 
distribution have also been observed in the physical situation. Fermi’ has 
expressed the geometric series like behavior of 7 iad Xn by the statement “- - - 
an impact of a neutron against a proton reduces, on the average, the neutron 
energy by a factor 1/e,”’ and Bethe’ remarks that “--- the actual values of log EZ 
after n collisions form very nearly a Gaussian distribution ---”’ 

*H. A. Bethe, ‘“‘Nuclear Physics, B. Nuclear Dynamics, Theoretical,’’ Reviews of Modern 
Physics, Vol. 9(1937) p. 120. 

‘This distribution has been calculated by E. U. Condon and G. Breit, ‘‘The energy 
distribution of neutrons slowed by elastic impacts,’’ Physical Review, Vol. 49(1936) pp. 
229-231. 


§ Quoted by Condon and Breit, loc. cit. 
6 Loc. cit. 




























== 


. 


SS 


28. s= 


=. 


= 


ON BIASES IN ESTIMATION DUE TO THE USE OF PRELIMINARY 
TESTS OF SIGNIFICANCE 


By T. A. BANCROFT 
Iowa State College 
I. INTRODUCTION 


In problems of statistical estimation we often express the joint frequency 
distribution of the sample observations 2 , %,---+ 2, in the form 


J f(ti, +++ ,%n 50,8, y, -+>)I dz; , 


where the functional form, f, is assumed known, and a, 8, y, - ++ are certain popu- 
lation parameters whose values may or may not be known. Given this specifica- 
tion, statistical theory provides routine mathematical processes for obtaining 
estimates of the parameters a, 8, y, +--+ from the observations 2, %2, +++ , 2. 

Ir performing tests of significance we often assume that the data follow some 
distribution 


(2) filti, +++, Inj a, B, y, ---)M dz, (¢ = 1,-+-,n) 


where fi is a known function or family of functions. We may wish to test the 
hypothesis that the data follow the more specialized distribution 


(3) falar , See a ae sa’, 6’, Y’, -+ DIT dx; , (i = ‘ cee , n) 


where f: is some member or sub-group of the family fi. Given this specification, 
statistical theory provides routine mathematical processes for testing such 
hypotheses. 

In the application of statistical theory to specific data, there is often some 
uncertainty about the appropriate specifications in equations (1), (2) and (3), 
In such cases preliminary tests of significance have been used, in practice, as 
an aid in choosing a specification. We shall give several examples from the 
literature of statistical methodology. 

(1) In an analysis of variance, in order to obtain a best estimate of variance, 
we may be uncertain as to whether two mean squares in the lines of the analysis 
may be assumed homogeneous, [1]. Suppose that it is desired to estimate the 
rariance a1, of which an unbiased estimate s; is available. In addition, there 
is an unbiased estimate s: of «2, where from the nature of the data it is known 
that either : = oj or 02 < oi. Asacriterion in making a decision the following 
rule of procedure is used frequently: test si/s: by the F-test, where sj and 
are the two mean squares. If F is not significant at some assigned significance 
level use (nisi + mos>)/(n1 + ne) as the estimate of oj. If F is significant at the 
assigned significance level, use si; as the estimate of a}. 

(2) After working out the regression of y on a number of independent variates 
we may be uncertain as to the appropriateness of the retention of some one of 
190 


> 


an 


we 


-_ \e 


Ce vr nwnrevs 





TESTS OF SIGNIFICANCE 191 


the independent variates, [2]. To illustrate let us consider the choice between 
the regression equations y = bx, + box and y’ = bia, , after having fitted 
y = bx, + box, ; the population regression equation being y = (iti + Bor. 
In this case a procedure commonly used in deciding whether to retain 22 is as 
follows: we test s3/s3 by the F-test, where s> is the reduction in sum of squares 
due to x2 after fitting x , and s3 is the residual mean square. If F is not signifi- 
cant at some assigned significance level we omit the term containing x2 and use 
b, as the estimate of 6,. If F is significant we retain the term containing 22 
and use b; as the estimate of 6,. A similar example occurs in fitting a poly- 
nomial, when there is uncertainty as to the appropriate degree for the poly- 
nomial [3]. 

(3) In certain analyses we may be uncertain as to the appropriateness of the 
use of the x’ test. Bartlett gives an illustration in a discussion of binomial 
variation, [4]. He performs two supplementary x’ tests of significance as an aid 
in deciding to abandon the main use of the x’ test altogether, and proceeds to use 
an analysis of variance instead. It is of interest to note that the main use of the 
x test gives a significant difference at the 5% level while, in the analysis of 
variance, Fisher’s z is not significant at the 5% level. Here again we might 
formulate a “rule of procedure” and follow through the analysis as in the pre- 
ceding cases. 

This use of tests of significance as an aid in determining an appropriate speci- 
fication, and hence the form that the completed analysis shall take, involves 
acting as if the null hypothesis is false in those cases in which it is refuted at some 
assigned significance level, and, on the other hand, acting as if the null hypothesis 
is true in those cases in which we fail to refute it at the assigned significance level. 
An investigation of the consequences of some of these uses is the purpose of this 
paper. 

It is proposed to consider the first two cases mentioned above: (1) a test of the 
homogeneity of variances, and (2) a test of a regression coefficient. A complete 
investigation of the consequences of the rules of procedure would be very exten- 
sive, since these consequences depend on the form of the subsequent statistical 
analysis. As a beginning, it is proposed to limit the study to the efficiency of 
these “rules of procedure” in the control of bias. 

The need for solutions of a whole family of problems of this kind has been 
pointed out recently by Berkson [5]. 


Il. EXAMPLE ONE: TEST OF HOMOGENEITY OF VARIANCES 


1. Statement of the problem. sj and s; are two independent estimates of 
variances o; and o> respectively, (such that m81/o1 , N282/02 are distributed inde- 
pendently according to x; and x2, With nm; and n degrees of freedom). It is 
known that 3 < oi. To obtain from these an estimate of oj , to be used in the 
particular analysis in hand, we formulate a rule of procedure. 


7 2 2 . . . 
2. Rule of procedure. Test s/s: by the F-test. If F is non-significant at 
* ° ae 2 2 ° 
some assigned significance level, we use (1181 + moS2)/(m1 + ne) as the estimate 





192 T. A. BANCROFT 


2 a ‘ om . . ae 2 

of o;. If F is significant at some assigned significance level we use s; as the 
e . 2 Tr . . 2 . . e 

estimate of ¢;. The estimate of o; obtained by this rule of procedure will be 


denoted by e*. 


3. Object of this investigation. If we follow such a rule of procedure, what 
vill be the bias in our estimate e* of 1 ? 


4. Derivation of the expected value of e*. [First we wish to find 


msi + N83 Si 
1°%l1 292 : 1 
p(mitme), fey 
Mm + Ne So 
where 2 is the value on the F-distribution corresponding to some assigned sig- 
nificance level for n; and nz degrees of freedom. 
2 2 e 2 9 . . . * « 

Let v; = 81, v2 = 8. Since si and 82 are independently distributed, the joint 

distribution of v; and v is 


a ee lfm Nz V2 
C yi" v3"? exp | -3( > ae )| dv dvz ) 
C1 02 


where ¢; is a constant and 7 and nz are the respective degrees of freedom. 
Let us make the transformation of variables 


UW = MY + mm, 0< um < w 


m= — 0< wm <n, 
Ve 


u — 
then the expected value, EZ, , of ——*— for uw < is given by 


m + Ne 


$ny—1 


A po 
7 Ci Us 
m+ mm): = — ff} sic 
( 1 ) 1 P(uz << d) 0 “0 (m1 Us + =)? 


‘ 1 U1 N1 Ur Ne 
yprrtna) exp | -3 — — + = du, duz 
2 U2 + Ne \ oj a2 


where P(u2 < X) is the probability of uz being less than X. 
Integrating out wu, and expressing the result in terms of the incomplete beta 
function we obtain 


— M1 (om + 1, bm)oi + m12(4m , 4M + 1)05 
(4) (m4 + m)E, = — eee ee Plus < x) 2h , ge TT 


2, 2 
where 2» = (med)/(m2 + med), ¢ = 02/01. 
: . 2 2,2 : 
We wish now to find the expected value of s; when si/sz > >. Again we start 
with the joint distribution of v; and v2 , given above and this time let 
V2 


—=/f/, = 01, 
Vv} 





TESTS GF SIGNIFICANCE 


then the expected value, £2, of v», when Y S is 


w/a si 4 1 nN n Y 
rino—l 3(n 1 +4+-nQ) J 2 
1)! [vrrtetore? exp | —5n (3 + 2) Janay. 


C1 
E, = —— 


<2 G2 


a 


Integrating out v; as a gamma function, and expressing the results as incomplete 
beta functions we obtain 


(5) BE, sa {1 — I,.(3m + * An») |o} 


— P(YY < 1/r) | ; 
where 
Xo = nyr/(n2. + ned) as before. 


§. Final Results. The probability that we use (msi + mes2)/(m + m) is 
P(w < A). From equation (4) the contribution from this case to the mean 
value of e* is 

ma Tey(3m + 1, 3ma)or + MaTso(3m , 3m + Vos 
™m + Ne ; 





The probability that we use sj is P(Y < 1/\). From equation (5) the contribu- 
tion from this case is 


[1 — Ie,(3m1 + 1, 3m2)]Joi 
The expected value of e* is obtained by combining the two cases, i.e., 


No 
Ny + M% 


(6) | E(e*) = E + 


o 
o1 


2 
; {Taam ’ 2N2 + 1) — — T,,(3m + 1, in) oi . 


‘i ‘ ‘ ‘. 
Hence the bias in e*, expressed as a fraction of oj is 


2 
ql [tad dre + 1) % — Lebm + 1, 4m) | 


Ny + N O1 


° . . 2 . int . . . 
We note that in estimating o; there will be a positive bias, no bias, or a negative 
bias according as 


T,.(3m + 1, 3m) 
is greater than, equal to, or less than o1/o> . 
6. Identity and checks. If o; = 62 , then in section 4, E; = oi and 


P(uz < X) = Tao(m , 3%). 





194 T. A. BANCROFT 


From (4) this gives the identity 


(ny + ne)Izo(3n1 , 32) = MT 2,(3mi + 1, m2) + Nele(Fm , FN2 + 1) 


where 2% = mA/(n2 + mA). This identity may be established easily by ele. 
mentary calculus. 

The first result in equation (6) may be checked by noting that when \ = «, 
i.e. when the two mean squares are always _voled, x is 1 and equation (6) 
reduces to (niai + meo2)/(m1 + mm). Similarly when \ = 0, in which case there 
is no pooling, 2 = 0, and equation (6) reduces to oi. 


7. Discussion. In making a choice of an appropriate estimate of o} we may 
consider three procedures: 

(1) Use sj always. This has the merit of having no bias, but is likely to have 
a large sampling error. : , 

(2) Always pool, i.e., use ~ * 82 When oi ¥ a2 this is biased, but in 

m + Ne 

compensation will have less sampling error than (1) since it will be based on 
(nm, + ne) degrees of freedom. : 

(3) Use the test of significance of = as a criterion in making the decision 

S2 

as to whether to pool the two mean squares or not. If the test discriminates 
properly between cases where pooling should or should not be made, the pre- 
liminary test of significance criterion will utilize the extra n. degrees of freedom 
whenever permissible and also avoid the bias in method (2). 

In Table I the expected value E(e*) divided by oj, is given for two sets of 

values of m;, ne somewhat typical of those frequently encountered in applied 

work, and for a series of values of 63/01. In addition to the case of always pool- 
ing (A = «) and that of never pooling (A = 0), the results for \ at the 1 percent, 
5 percent, 20 percent levels and for \ = 1 have been tabulated. By subtracting 
unity from the results the bias is obtained as a fraction of oi. The Table was 
computed from the incomplete beta function Tables [6]. 

When the two mean squares are always pooled, the fractional bias is negative 
and increases numerically as o2 becomes small relative to oj. By examination 
of the values in Table I for o/c; = .1, it will be seen that the preliminary test 
of significance controls the bias well when «2 is much smaller than oi, that is 
when a large bias from pooling is most to be feared. This result happens be- 
cause in such cases the preliminary test allows pooling only in a small propor- 
tion of samples. | 

If \ is taken at the 1 or 5 percent levels, the maximum bias appears to occur 
when o3/c; is in the region 0.4-0.5, there being little bias when o3 is near &. 
The lower values of \ (20 percent or \ equals 1) control the bias satisfactorily 
in the region 62 < .60;, but have a fairly substantial positive bias when o2 = i 
that is when pooling would actually be justified. By use of the relation between 
the incomplete beta function and the sum of the terms of a binomial series it 





TESTS OF SIGNIFICANCE 195 


can be shown that there is always a positive bias when oi = o2 and that for 
given numbers of degrees of freedom this bias is greatest when \ = 1. 

To summarize from the example in Table I, it seems that for small values of 
n and m2 none of the values of \ which have been investigated controls the bias 
throughout the whole range 0 < 3/01 < 1. 


TABLE I 
Expected Value of e*/oi: E(e*)/o} 





Case 1: nm, = 4, ne, = 20 


2,2 
02/0; 


a 2 3 4 oO 6 of 8 9 





.250| .333) .417| .500| .583) .667| .750) .833) .917 
.965| .870| .791| .750) .748) .775) .821| .880) .948 
.991| .960) .924| .901) .892| .903) .930) .970)1.02 
1.00 | .999)1.00 {1.01 |1.02 |1.04 |1.07 {1.11 {1.15 
1.00 |1.00 {1.01 |1.03 [1.05 {1.08 |1.11 |1.15 |1.20 
1.00 {1.00 {1.00 |1.00 {1.00 {1.00 |1.00 |1.00 |1.00 








Case 2: m, = 12, ne = 10 





2,2 
2/0; 


4 5 6 a 8 9 


.591) .636| .682| .727| .773| .818] .864| .909| .955 

.981| .896| .833] .814| .824) .850) .884/ .922) .963 

.998} .973) .935| .909| .901) .908) .928) .955) .989 
1.00 | .998) .993) .987| .986} .991)1.00 |1.02 |1.04 
1.00 |1.00 {1.00 {1.00 |1.01 |1.02 |1.04 |1.06 |1.08 
1.00 {1.00 {1.00 |1.00 |1.00 |1.00 {1.00 |1.00 |1.00 








8. The variance of e*. Using the same method we may obtain the variance 
of e*. The final result is 


m(m + 2)I2,(4m + 2, 3me)ot + 2nineI,,(3m + 1, }ne + 1)ojo3 


+ Ne(Ne + 2)I2,(3m : tne +. 2)o% 
(m1 + %)? 


+ 2 [1 — Iz.(m + 2, 4m)]oi — E > mort 
nN uN + U7) 


2 
{Festi ? 22 + 1) = saa I,,(3m1 + A, im) on ‘ 
O71 





196 T. A. BANCROFT 


From the relations in deriving this result the following identity was obtained: 
(my + me + 2)(m1 + me)I2,(3m1 , 32) = mlm + 2)I,,(31 + 2, Fne) 
+ nynels,(3m1 + 1, ne + 1) + neo(me + 2)Tz(3m1 , $2 + 2), 


This identity can be readily established by elementary calculus. 

As a check on the result in equation (8), we note that if \ = %, then a = 1, 

and 7 < Xd always. The variance of the estimate of variance becomes 

So a 
Amat + meaz)/(nm1 + me)’, which checks with the variance of (msi + Nos3)/ 
(nm. + mm) for the case of always pooling. If in addition oj = o2, then 
V = 201/(n. + m). If = 0, then 2 = 0, and sj/s; = AX always. The variance 
of the estimate of variance becomes 2¢{/n; which checks with the variance of s? 
for the case of never pooling. 

The expression for the variance of e* enables us to investigate how much has 
been gained in terms of reduction in variance by the use of the preliminary test. 
The quantity {V + (Bias)”} is the appropriate value for the whole sampling 
error, where V is given by (8) and the bias by (7). For the two numerical 
examples these quantities are shown as fractions of oj in Table II. 

As a standard of comparison the variances of the estimate sj (no pooling) will 
be used. In these examples the preliminary test with \ = 1 produces a variance 
smaller than that of sj for all values of o3/o} except the lowest (0.1) where the 
two variances are equal. As A is taken successively higher there is a substantial 
reduction in variance when «3 is near oj but an increase in variance over that of 


si when o2/o; is small. Throughout nearly all the range of values of o3/c1, 
the smallest variance is obtained by always pooling (A = ~), despite the rela- 
tively large bias given by that method. This result is a reflection of the in- 
stability of estimates of variance which are based on only a few degrees of 
freedom. 


ages fee 


Ill. EXAMPLE TWO: TEST OF A REGRESSION COEFFICIENT 
1. Regression and some properties of orthogonal functions. Let 
y = Bit: + Bote + 


be a linear regression of y on the two variates 2; and x2 in which 8; and # are the 
respective population regression coefficients and e is the error. We assume that 
2 , X2 and y are measured from their respective sample means and that the values 
of x; and 22 are fixed from sample to sample. In order to make comparisons 
among samples of different sizes we assume that x, and x2 have unit variances 
and correlation coefficient’ p so that 


— > ae Ss 


S(a2) = n — 1, S(x%2) = p(n — 1), 


1 Although p is commonly used to denote a population correlation coefficient, we are 
using it here for the sample correlation coefficient between the fized variates 2, and 2. 





TESTS OF SIGNIFICANCE 


TABLE II 


a V + (Bias)’ 
The Variance of e* About its True Mean: Md 
oi 





Case 1: n; = 4, ne = 20 





03/9; 


2 3 4 oO 6 ‘ : 


.462| .360) .275) .205] .149 
.620} .603) .523) .350) .323 
.045) .554|) .528) .479) .414 
.500} .493| .480) .458) .435 
.493| .480| .462) .441) .423 
.500} .500} .500) .500) .500 








Case 2:n, = 12, nz = 10 


03/0} 





3 4 5 6 7 


.154| .130) .112| .097| .088) . 
.203} .171} .141) .118] .103 

.194| .183) .163} .142) .125) .114 
.171| .170) .164| .156) .146) .139 
. 165} .162) .158) .152) .148) .144 
.167|) .167| .167| .167| .167| .167 





9 


.085 
.092 
. 109 
.135 
. 144 
. 167 


1.0 


.091 
.096 
. 109 
135 
. 147 
. 167 





4 2 . 2 . . - 
where S(xi) denotes summation of x; over the sample, with similar 


2 . ° 
for S(xs) and S(a,22), where n is the sample size. 
We make the orthogonal transformations 


&§ = 1, fo = Xe — pu, 
then 


y = Bibi + Bol& + phi) + e. 
But 


SE =—=it-— R, Sk = (n a 1)(1 7 p); S(£:k2) = 0, 
therefore 


S(yé) = Bil(n — 1) + Bop(n — 1) + S(me), 
and 


S(y&) = B(n — 1) —- p) + S(te — par)e. 


meanings 











198 T. A. BANCROFT 


Now if we represent the regression coefficients of y on the £’s as B’s we have 
2 2 . 
BS(&) = S(&y), BS(&) = S(éy). 
The reduction in the total sum of squares due to 2 ignoring 22 is 


S 1 -— | 1 > 2 S 1 
i RT _ Ne inane 


The reduction in the total sum of squares due to 22 after fitting x; is 


[| S(& y)}? _ tn = 1) Be (li — 7) + Sa — pxs)el 
se) ~ (n — 1)(1 — p’) 


The reduction in the total sum of squares due to regression is 


[(n — 1)(6: + Bop) + S(are)P 4. Welz — pt) + (nm — 1)B(1 — p’)P 


B2 S(yé&) = 


n—1 (n— 1)(1 — p?) ’ 
in which the two parts are independently distributed. 
/ . Ss ae . —" - 
Let b; be the regression coefficient of y on x, when the term containing 2 is 


omitted from the regression equation. Now, 


‘_ S(éiy)  (m — 1)(B1 + Bep) + S(re) | 
ee 88) n—1 

Hence 

(9) E(bi) = Bi + Bep. 


Let b. be the regression coefficient of y on 22 if both 2; and x2 are used. Then 


ag _ Sy) — (n — I)Aa(l — #*) + S(m2 — pee 


= 5 ~ (n — 1)(1 — 9”) 
And 
S(#) 1 "4 
V b = > = = nomen rs 
” [S(é)  S(é) (mn — 1) — 6?) 
The normal equations for Y = bia + bear are 
b,S(2i) ob beS (2422) = S(xy), 
by S(a22) + beS(x2) = S(xey). 
Now 
,  S(ary) S(a1 22) 
a lee awh +h 
S(a2? yy S(2? 
Therefore 
bi = bi + bop, 
or 





TESTS OF SIGNIFICANCE 199 


Therefore 
(10) E(bi1) = Bi + Bop — pE(by). 


We notice that if p = 0, b; is unbiased in any selected portion of the population. 


2. Statement of the problem. To obtain an estimate of b,, in a particular 
analysis in hand, in which it is desirable to choose by means of a test of signi- 
ficance between using the regression equation Y = b,a, + bea, and Y’ = biz, 
we formulate a rule of procedure. 


3. Rule of procedure. Calculate the following analysis of variance: 


Degrees of freedom Sum of squares Mean square 
Reduction due to 2 1 [(n — 1)(Ar ia * S(a1e)F 
Reduction due to 2» ; [(m — 1) (1 — p) + S(a2 — pxy)el” e 
after fitting x, (n — 1) — pf) $2 
Residual n—3 Sly ~ YF & 


9 
ji S2 7 . ° . og ° . ° 
Test -, by the F-test. If F is non-significant at some assigned significance 
83 
level we omit the term containing 2x. and use 
— (n — 1)(Bi1 + Bop) + S(x1e) 
re As Te ABs 
n-i 


as the estimate of 6,. If F is significant, we retain the term containing 22 , 
and use 6; as the estimate of 8,;. The estimate obtained by this rule will be 
called b*. 


4. Object of this investigation. If we follow such a rule of procedure, what 
will be the bias in b* as an estimate of 6; ? 


5. Mathematical derivation of the bias. First, we wish E(b,) when 


2 ° 

S82 bs r 

=3<A or —< —— —< 
83 s (1 — p(n — 1)’ 


where \ is the value on Snedecor’s F-distribution corresponding to some assigned 
significance level for 1 and (n — 3) degrees of freedom. From (9) we have 


(11) E(bi1) = Bi + Bop, 
2 
S2 : . : 2 2 : 
no matter what the value of — ; since from section 1, sj and s2 are independently 
83 


distributed. 





200 T. A. BANCROFT 


Next we wish E(b,) when 7 = ror ' Zz aniniilacianas . 
83 s; (1 — p?)(n — 1) 


find it more convenient to find first #(b.) when 


To obtain this we 


bs r V3 ] 
ad a, OF 2 < aa 
ae (—rka— 1) bs ~ Acee 
where 
1 


by = 83 and C» = in = i = a 


The joint distribution of be, vs is 
> —}(be—Bo)2/c¢ —5) —3(n—3) 
Kee 02-62) leas yhn-®) E-Am—9)08 dy, ds, 
where K is a constant. We make the transformation of variables 


v3 
“us <s 


b dv; = b; du; 


then the joint distribution of b. and wu is 


> —(be— 9)2/2 99 2 —5 —4(n—3)b2 2 
Ke 0282)? 222 (ype tr) gin 90 BF du dbs. 


1 
Taking the expected value when u < ren, We have 
22 


K 
E(t) = 
P(u <- ) 
; AC22 


0 pl/Aco2 5 . i eli 2 — ¢ 9 
[ | bo | be “ ur) exp | -“ Be) — ee bs «| du dbr , 
co “0 


. >, a 
where v2. = s2, and P( 2 ) is the probability that u be less than or equal 


C2» 
1 
On 
Dropping subscripts for convenience and expanding the factor which involves 
e to the first power of b, we have 
» —B2/2c oo 1/Xc 
E(b) = a ; [ I b| b |" Fate 
wo 40 
Plu<=- 
(u< 3) 


oo (wf 8= 2h +) +4 2) + Joa 


1 
O<u<y,: 


t 





TESTS OF SIGNIFICANCE 201 


Now, clearly the even terms of the series vanish whether n is odd or even when 
b is integrated out. 

After integration with respect to b, we have an infinite series in which the 
typical term (apart from constants) is of the form 


uh) 11 + (n — 3)eul 


where 7 is an even positive integer. Subsequent integration with respect to u 
leads to an infinite series of incomplete integrals of the F distribution. By 
transforming the integrals, the series may be expressed in terms of incomplete 
beta functions as follows: 


2 


Then we have 


(12) Es) = ~~ yer n(8S, + i), 
P(«< x3) 


1 
where 2 = 4 
sociales a+ 1 


n— 
and X is the ae % point of the F-distribution for 1 and (n — 3) a of 
freedom. Now from (10) we have 
E(b:) = Bi + Bop — pE(be) 
which enables us to obtain E(b;) from (12). 


6. Final result. From (10), (11) and (12) we have 


E(b*) = P(2 < ) (8: + Bee) + [: - P(® < ») Jia + p{B2 — E(be)}] 


Bi + pBe — [: — P(? < ) | pE (bs). 





202 T. A. BANCROFT 


ol ee —41 —P(% <x E(b») |. 
[= —{1- (2 <a)} 20 | 


Substituting the value of E(be), we obtain 


ae ae" n-33,. 
Bias = pas| —_ x 7 1. ( 9 i 9 + i), 


1 ‘ _ ° 
where 2 = x ,a=(1-—p) ce 2 ") Bo. 
ane 


n—-~3 


The bias in b* is 


7. Checks. From section 5 we have 


v i= 
02 > > 1=0 
V3 


E (be) a Ba _ Sj i (* = .. ; 
P( ) ! 


] 
where 2% = —.—— y 
r 
1 
n—3 + 
If \ = 0, then x = 1, and E(be) = . 
Also from section 5 we have 


Bias = pps E ~<_. 7 he (” ~ 4 


i=0 


Iix = 0, then x = 1, and Bias = 0. 
If\ = «, then % = 0, and Bias = pf. . 


8. Discussion. From the mathematical form of the bias, 


mem BET (GASH) 


i=0 


1 


i ? 
‘-3 
four deductions follow immediately: (i) There is no bias in estimating 6,, if 
p or B2 is zero. (ii) The coefficient of 8. in the formula is less than or at most 
equal to one in absolute value. (iii) The sign of the bias depends upon the signs 
of p and f» ; it is positive if both are positive or both negative, it is negative if p 
and £2 have opposite signs. (iv) The bias is estimating #; is independent of ;. 

We shall discuss the bias in a few special cases by means of selected values of 
n, p, B and ». In Table III are exhibited the values of the bias for n equal 
to 5, 11, 21, each at p equal to .2, 4, .6, .8, and ® equal to .1, .4, 1.0, 2.0, 


where 2% = 





TESTS OF SIGNIFICANCE 203 


and 4.0. These values have been computed at the 5% point for A, and at \ = 1. 
These special cases seem to indicate: (i) If we fix p, 8, and \ and increase n, 
then the bias decreases. (ii) If we fix p, 8, and n and change \ from the 5% 
point to \ = 1, the bias decreases considerably. (iii) If we fix p, n, \ and increase 


TABLE III 
_ The Bias: in Estimating Bi 


As = 5.318 
n= il 


4 | 6 Ss a 4 
51) .069| .015) 030) ,046) 061.014 .029). 
| -272) .049) 101, . 159) .227).033} 071). 
.640' .028| .072| .164|.350).001/.006). 
.038| .000| .000| .001/.083: .000| .000) .000). 
898) .000) .000| .000 ,000) .000). 000) 000). 





A= 1 a 
n = 21 


4 6 

0.1 | .004} 008! .011) 015) .004) .008) .011 -015|.004| .008| .0111. 

0.4 012} .026| .040) .057| .009| .019| .032|.051) .005| .011] 022. 
1.0 | .011| .025) .049) .0¢ 001} .003) .010| .039) .000| 000 .001| .009 
2.0 | .000| .002) .008! .043/ .000! .000! .000|.000!.000!.000| .000) .000 


4.0 | .000} .000} .000, .000, .000| .000} .000|.000).000).000| .000| .000 








6. the bias increases and then decreases. This may be explained in the following 
manner. From section 6, the bias may be written in the form 


* s) = ae" n—3 3 
tin « & ——t 2. = If -,o+ i). 


P (: > s) i=0 a! 2 ’ 2 


Now if p, n, \ are held constant and # is relatively small, P(e < s) is relatively 


large and ya Ae nie : 3 + ‘)i is relatively large, but P(e > ) is rela- 


i=(0 


tively small. ais e, foraw am as we increase 62 the bias will increase, but as B. 


do ca i —a oT o e : 
gets larger < < s) and )) a’e “I af? 5 : 5+ ' becomes smaller while 
dg i=0 


(2 > } becomes larger. Hence, a value of 82 will be reached at which the 
)3 





204 T. A. BANCROFT 


bias begins to decrease. (iv) If we fix n, 82, and X and increase p, the bias jp- 
creases without exception. 

The above results were obtained under the assumption that a test of signi- 
ficance criterion is used in making a choice as to the number of independent 
variables to be retained after the regression y = b,a, + bere has been fitted, 
If this test of significance criterion is used, we may wish to have a means of 
controlling the bias. From a study of Table IIT we note that the bias may be 
decreased by increasing n and by using \ = 1. We also notice that as B, in- 
creases from 0.1 to 4.0 the bias increases and then decreases; and so passes 
through a maximum value. Hence, if we have a regression in which 8; is fairly 
well below or above this maximum value, we would expect a smaller bias. 

The bias in estimating 8; is “unstudentized,”’ i.e., is a function of the population 
parameters p and 6. In any particular analysis in hand, it would be necessary 
to know the values of p and #2 or be willing to use estimates of them obtained 
from the data. 

It is realized that only a beginning has been made on the regression problem: 
an investigation should be undertaken of the more general problem of the use of a 
test of significance criterion in making a choice as to the number of independent 
variables to be retained after the regression 


, = bia + boxe + ha tas + OS. 
has been fitted. 


REFERENCES 


[1] J. Wishart and A. R. CLapnam, ‘A study in sampling techniques: the effect of artificial 
fertilizers on the yield of potatoes,” Jour. Agri. Sci., Vol. 19 (1929), Part 4, p. 605. 

[2] W. G. Cocuran, ‘The omission or addition of an independent variate in multiple linear 
regression,’’ Jour. Roy. Stat. Soc. Suppl., Vol. 5 (1938), p. 171. 

[3] G. W. SnepEcor, Statistical Methods, Third edition. Towa State College Press, Ames, 
Iowa. 1940 Sec. 14.3. 

[4] M. S. Bartiert, “Square root transformation in analysis of variance,’’ Jour. Roy. 
Stat. Soc. Suppl., Vol. 3 (1936), pp. 76-77. 

[5] Josern Berkson, “Tests of significance considered as evidence,’? Jour. Amer. Stat. 
Assn., Vol. 37 (1942), pp. 325-335. 

[6] Kart Pearson (Editor), Tables of the Incomplete Beta Function, Cambridge Univ. 
Press. Biometrika. Univ. Coll., London, 1934. 





THE PROBABILITY OF CONVERGENCE OF AN ITERATIVE PROCESS 
OF INVERTING A MATRIX 


By JosePpH ULLMAN 


Columbia University 


Introduction. The inversion of a matrix is a computational problem of wide 
application. ‘This is « further study of an efficient iterative method of matrix 
inversion described by Harold Hotelling [1], with an examination of the prob- 
ability of convergence in relation to the accuracy of the initial approximation. 
The lines of investigation were suggested both by his article and by helpful 
comments made during the course of the research. 

The inverse of a matrix can be obtained to any desired degree of accuracy by 
using a variation of the Doolittle method, and starting with a sufficient number 
of accurate decima! places in the matrix being inverted. This procedure be- 
comes inefficient if the order of the matrix is large, or if the desired degree of 
accuracy is very great. In either case the efficiency can be greatly increased 
by first obtaining an approximation to a small number of decimal places and then 
applying a method of iteration until the desired accuracy is achieved. 


1. Iterative methods. Hotelling’s method of iteration is as follows. Let A 


be the matrix to be inverted and let Cy be the approximation to the inverse. 
Calculate in turn C,, C2, --+ where, 


(1.1) Caw = Ca(2 — AC). 
This sequence of matrices will converge to the inverse of A if the roots of 
(1.2) D=1-—ACQ, 


are all less than unity in absolute value. 

The iterative method (1.1) will be generalized to yield a class of iterative 
methods, one element of which will be shown to be more efficient, in certain cases, 
than method (1.1). The generalized iterative method is, 


(13) Cris = Cuf{l + (1 — ACm) + (1 — ACa)? + °° + (L — ACu)*"}. 


For every k, the condition for convergence is that the roots of the matrix (1.2) 
all be less than unity in absolute value. 


A method of comparing the efficiency of these different iterative methods 
arises from the following considerations. Since 


(1.4) Co = A(AC)), 
which is equivalent to 


(1.5) Co = A(1 — D), 
205 


ee 


esau 


f 
| 
j 
| 
: 





206 JOSEPH ULLMAN 


it follows that 
(1.6) A = O,(1 — D)™. 


When the roots of D are all less than unity in absolute value, (1.6) has the 
infinite expansion, 


(1.7) A*=Cii+D+D + D +.---). 


The general iterative process (1.3) generates the infinite series in the following 
manner, 


Cm 0460407 4-407 140 +--+") 
(1+ DY +--+ DPF)... 


Each parentheses corresponds to one iteration. Hence k” terms are generated 
by m iterations. In order to achieve the accuracy of n terms in (1.7), m = 
log. n/log. k iterations are required. Each iteration involves k matrix multi- 
plications, so that km = k log, n/log. k is the total number of matrix multiplica- 
tions necessary to achieve this degree of accuracy. 

The integer for which this is a minimum is three. Therefore the “most effi- 
cient”? method of iteration is, 


(1.10) Cai = Cafl + (1 — ACn) + (1 — ACa)*}. 


If the desired degree of accuracy can be achieved by one application of (1.1), 
or by two applications of (1.1) but not by one application of (1.10), then (1.1) 
is preferable. 


2. The condition for convergence. The sequence, 
(2.1) Ci, Co, Cs, 
obtained from (1.3) will converge to the inverse of A if the roots of 
(2.2) D=1-— AC, 


are all less than unity in absolute value. The following assumptions determine 
the nature of D. 

We assume that the expected value of each element of the first approximation 
Cy is equal to the corresponding element of the exact inverse of A. The actual 
values of the elements of Co will deviate from their expected values. We will 
consider two important cases. If the deviations are entirely due to the fact that 
the elements of Cy are only accurate to a limited number of decimal places, say 
k, then the deviations may be regarded as distributed with constant density over 
a range of length 10“. It will be assumed that the deviations of the elements 
of Cy from their expected values are independent. While this case arises in 
practice, we will first treat a closely related case, which lends itself to exact 
treatment more readily. We assume that the deviations of the elements of Co 








be 
\- 


es 


ne 


ict, 


CONVERGENCE OF AN ITERATIVE PROCESS 207 



















are normally distributed about their expected values, with variance p = 10°**/ 12. 
The variance yu is the same as that which arises if the probability density is uni- 
form with range 10“. 

The elements of F, the matrix of deviations, 


(2.3) E=A™"—-(C, 


are independently and normally distributed. Combining (2.2) and (2.3) we 
obtain 


(2.4) D =1— ACy = A(A™ — CG) = AE. 


Let p be the order of the matrix A. Each element of D will be a linear com- 
bination of p independently and normally distributed variables, and therefore 
will itself be normally distributed. A sufficient condition for all the roots of D 
to be less than unity in absolute value, and hence for the process of iteration to 
converge, is for the sum of the squares of the elements of D to be less than unity 
in absolute value. We will use the following notation 





(2.5) di;: the element of D in the ith row and jth column, 
No = DD di; . 
i i 


A procedure suggested by this relationship is to determine the probability 
distribution of Ni, , so that probability statements concerning the absolute value 
of the roots of D can be made. Because the elements of D are not all inde- 
pendent, no multiple of Ni, can be expected to have the x{,2) distribution.’ 

The distribution of N’; is shown to be closely related to the chi-square distribu- 
tion in the next section, and on the basis of this relationship, lower bounds to 
the.probability of convergence of the iterative process are developed in section 4. 
In section 5 the exact distribution of the norm is obtained for a general class of 
vases. The final section is concerned with the validity of applying the results 
of this study to a practical situation, where the deviations of the elements of Co 
from their expected values are uniformly, rather than normally, distributed. 


















3. An equivalance. Let e;; be the element of F in the ith row and the jth 


column, and a;; be the element of A in the 7th row and the jth column. From 
(2.4) and (2.5), we find that 


dj; — = Qik Ckj - { 
k { 
Since the elements of E are independently and normally distributed with : 

° ~2k ih ° 
variance «» = 10 “/12 it follows readily that i 


(3.2) Elesjeen| = 5x5 jr. 






‘The number in the parentheses will indicate the number of degrees of freedom of the 
chi-square distribution. 


208 JOSEPH ULLMAN 


Making use of (3.1) and (3.2), we find that for two d;; in the same column, 
(3.3) Eldijdeil = uw D2 aieaee , 
t 


while for any two d;; in different columns, 
(3.4) E{dijden] = 0. 


From (38.3) and (3.4) it follows that it is permissible to regard the elements of 
the p columns of D as the coordinates of p independently selected points from a 
multivariate normal universe with covariance matrix ¢ = pAA’. We will let 
is. 

The moment generating function of the sum of squares of the coordinates of 


any point is 
|r| 
(3.5) x — oe 


This can also be written as 


1 
- (1 — 20:0)'(1 — Qoet)' --- (1 — 26,0)” 
where o1 , --* , ¢p are the characteristic roots of o. 
Since Nj, is the sum of p independent expressions of this type, its moment 
generating function is the pth power of (3.6), 
(3.7) (1 — 2oit)” --- (1 — 2,0)?” 


This expression is the moment generating function of 


2 2 2 
(3.8) O1X(p)1 + o2X(p)2 + °° + opxipp > 


where the x{»); are all independent. 
Writing the roots as 


(3.9) 0, 00 — ki, *++, 00 — Kp, 


where ap is the largest root of «, and all k; > 0, it follows that N% has the same 
distribution as 


p—l 


(3.10) oo dy Xin — » kx (ii - 
i j= 


Therefore, making use of the reproductive power of x’, we pbtain 


p-l 
P{No < 1} = Pho Xi < 1+ De ki xin 
t an 


p—1 
= P {eoxien <i+ d k; xinik. 
= 


(3.11) 





CONVERGENCE OF AN ITERATIVE PROCESS 209 


By making special assumptions about the 4; , close approximations to the 
probability that Np will be less than one, and hence that the process of iteration 
will converge, can be obtained. Instead of following this procedure, it is more 
desirable to have definite lower bounds for the probability that N» will be less 
than one. This will lead to an overstatement in the number of decimal places 
of accuracy necessary in the first approximation Cy to assure convergence, but 
it will practically eliminate the possibility of having to recalculate the first 
_ approximation, and hence will lead to greater efficiency in the long run. 


4. The derivation of the formula for determining the required degree of ac- 
curacy. The inequality used in this section is derived in two steps from (3.10). 
Since k; > O (¢@ = 1, --- , p — 1) it follows immediately that 


(4.1) P{Nv < 1} > Pfooxip2) < 1}. 

In order to use this inequality, the upper bound for oo 
(4.2) oo < (tro) 

can be used. For ¢ . 

(4.3) oo < tro = trpAA’ = utr AA’ = uN. 


Dr. Wald pointed out that using (4.2) for t = 1 reduces the amount of informa- 
tion retained in (4.1) to that which is contained in the inequality, 


(4.4) N(D) < N(A)N(E). 


A closer upper bound is feasible in any particular case, and can be introduced 
at this point by letting ¢ = 2 ort = 3. The following formula will be developed 
for the general case, making use of (4.3). 

Substituting (4.3) in (4.1), we obtain 


. 1 
(4.5) Pin, < ij >? {xin s aay: 
jal 


A 


It is desirable to separate the effects of the order of the matrix A on con- 
vergence, and the order of magnitude of the elements. Hence we introduce as a 
measure of the average size of the a;; their root mean square m, so that 


(4.6) m = p 2 >. ai;/p’. 
i jj 

Hence 

(4.7) Na = pm. 


The final form of the inequality is 


; : 12-10% 
(4.8) P{Nv < 1} < P {xi < ae}. 
pm’ 





210 JOSEPH ULLMAN 


First we will obtain an expression for the number of decimal places required 
in the first approximation to make the probability of convergence at least .999, 
Then the expression will be checked directly by means of (4.8) and tables of the 
chi-square function. 

For large values of p, ~/ 2x72) is approximately normally distributed with 
mean value +/2p? — 1 and unit variance [2], [5]. Applying this transformation 
to the right hand side of (4.8), and noting that 3.1 standard deviations is slightly 
greater than the deviation corresponding to .999, we obtain as the condition for 


2k 
(4.9) P {xj <= b> 999 
p?m 
or 


(4.10) P {2xien < 2. 999 


12- a}? 


that it is sufficient that 


(4.11) 4/4: 10% _ V/ 2p? — 1 


nm? 


This is equivalent to 


3.1 
k > logw p + logiom + loeu( V7 a te 7 


— dlogi 24 + logi V2. 


(4.12) 


Since the characteristic of a logarithm is insensitive to the argument, rounding 
off will introduce a negligible error, and we finally obtain an upper limit to the 
lower bound of k, 


(4.13) k > logi m + logw p + logiw (p + 3) — .55 


In order to verify the accuracy of (4.13) for small values of p, certain values of 
p, k and m are chosen and the probabilities associated with (4.8) determined [2]. 
The entries in brackets are the corresponding values of k determined from (4.18). 

A typical example will illustrate the use of table on facing page. Let the matrix 
A to be inverted be a fourth order correlation matrix. The mean magnitude mis 
about and p = 4. If the first approximation C» is obtained to one place accuracy, 
then the probability that the sequence C;, (2, --- will converge to A~° will be 
greater than .999. Using formula (4.13), we obtain / = .53. Since one is the 
first integer greater than .53, the table verifies the use of the formula. 

Although the formula was developed on the assumption that p is large, every 
value calculated is consistent with the table. This lends support to its use for 
small values of p. 








ling 
the 


8 of 








CONVERGENCE OF AN ITERATIVE PROCESS 


The Probability of Convergence of the Iterative Process* 






















































































Nop | is 
é/ | 2 3 | 4 5 
lkN\ | 
-1 0+ OF |  O+ 0+ 
m =} 0 [.05] .982 [.33].199 [.53]0+ [.70]0-+" 
1 1— 1— 1— | I= 
-1 | 0+ OF | OF | OF 
m = 2 0 [.85].051 [.93]0+ OF | OF 
ee 1— | [1.13].715 | [1.30]0— 
2 | i- | l- | l- 
0 | O+ 0+ 0+ 0+ 
m=10 1 (1.35].439 | [1.63]0+ 1{.83]0+ 0+ 
/ 2 | 1— 1— 1— | [2.00]1— 


*“] —”’ means greater than .999. 


It has already been pointed out that k is not sensitive to rounding off of the 
argument of the logarithm. Thus for p = 20 and m = 2, we can let logiom = .3, 
logio p = 1.3, logio (p + 3) = 1.36 and obtain 
k= 84+13+4 1.6 — 55 = 2.41, 


from which it follows that three decimal place accuracy in C» will practically 
insure convergence of the iterative process. 


(5.1) 


(5.2) 


These can be expressed in terms of the elements of A and the variance of the 


re 
ony 


‘5. The mean, variance, and exact distribution. To obtain the moments of 
N;, the most convenient form to use is (3.8). . Since the x{»)i are independent 


E{No] = E[DL oiximil = p a - 


E[N>] — (E[Nol)” 


Pp 
E | i(Xii) + 2 2X xinixiniose | — (p Lia)” 
i= i<j i 


(2p + p) Doi + 2p Do cio; — pw di oi — 2p” DY aio; 
i i<j i 


2p Do i. 


elements of EF, since 


(5.3) 


do: 
+ 

2 

doi 
+ 


tr (c) = ptr (AA’) = wNi, 


° 
tro 














































t<j 




















= pw tr (AA’AA’). 






212 JOSEPH ULLMAN 


The exact distribution of Nj, can be obtained readily when p is even. In this 
case the infinite integral, 


(5.4) 1 f™ ee dt 

Qr ti. (1 — i2o,t)P/? ... (1 — 120, t)?!?’ 
can be evaluated by contour integration. The integral satisfies the conditions 
given in Whittaker and Watson [3, sec. 622], if the semicircle of the contour is 
taken on the lower half of the complex plane. 
—i —i 
201” 202" 
The sum of the residues at these poles, multiplied by 7 yields the exact distribu- 
tion: 


For the case p = 2, for example, there are simple poles, at ¢ = 


2 2- 
—Np/2e, oe e Nvlee 


(5.5) Oe 4 ee 
2(o1 — a2) 2(o2 — a1) 
For even values of p greater than 2, the values of the residues can be obtained 
by repeated differentiation. 


6. Summary. We are now in a position to discuss the applicability of the 
results of this paper to the problem which arises most frequently in practice. 
The elements of the first approximation to the inverse will deviate from their 
expected values only because the first approximation is carried to a limited 
number of places, say k. In this case the deviations will be distributed with 
constant density over a range of length 10°". The elements of E, the matrix of 
deviations, 


(6.1) E=A"*-—-(G, 


are now each independently distributed, but with uniform density, range 10~ 
and mean equal to zero. From (2.4) 


(6.2) D = AE, 


we observe that each element of D will be a linear combination of p independently 
and rectangularly distributed variables, each with mean zero and range 10™. 
The analysis of sections 3, 4, and 5 will be valid if d,; can be considered to be 
normally distributed. 

There is much experimental evidence and theoretical justification for assum- 
ing that the elements of D are normally distributed. A sufficient condition that 
the d;; approach normality as p increases is that the sum of the a; in any row of 
A be divergent as the order of the matrix approach infinity, while at the same 
time every element be less than some constant value independent of the order 
of the matrix [4]. 

The experimental and theoretical evidence supporting the approach of the 
d;; to normality, the fact that the logarithms are insensitive to errors of approxi- 





CONVERGENCE OF AN ITERATIVE PROCESS 213 


mation in their arguments and the fact that the lower bounds to the probability 
of convergence of the iterative process are used, all lend support to the formula 


k > logis m + logw p + logi (p + 3) — .55. 


for determining the number of places (k) necessary in the first approximation 
(Co) to the inverse of A, a matrix of order p whose elements have mean size m, 
to make the probability at least .999 that the process of interation will yield a 
sequence of matrices which will converge to the true inverse. The ultimate 
justification of the use of this formula can only be by the results of its applica- 
tion in practice. 


REFERENCES 


[1] HaroLp HoTe.11NG, ‘‘Some new methods in matrix calculation,’ Annals of Math. Stat., 
Vol. 14 (1943) pp. 1-34. 

[2] FisHeR and Yates, Statistical Table for Biological, Agricultural and Medical Research, 
p. 31. 

[3] WHITTAKER and Watson, Modern Analysis, 4th edition, pp. 113-114. 

[4] J. V. Uspensxy, Introduction to Mathematical Probability, 1935, Chapter 14. 

[5] E. B. Wiuson and M. M. Hirrerry, ‘‘The distribution of chi-square,’’ Proc. Nat. Acad. 
Science, Vol. 17 (1931), pp. 684-688. 





NOTES 


This section is devoted to brief research and expository articles, notes on methodology 
and other short items. 


(Be a ne 


ON DISTRIBUTION-FREE TOLERANCE LIMITS IN 
RANDOM SAMPLING 


By HERBERT ROBBINS 


Post Graduate School, Annapolis, Md. 


Let X,,--- ,X, be independent random variables each with the continuous 
and differentiable cumulative distribution function o(z) = Pr(X; < 2). A 
continuous function f(a, --- , %,) with the property that the random variable 
Y = o(f(Xi1,--- ,4.)) has a probability distribution which is independent of 
o(x) will be called » distribution-free wpper tolerance limit’ (d. f. u. t. 1.). We 
shall prove 

THEOREM 1. A necessary and sufficient condition that the continuous function 
f(a, +++, %n) be ad. f. u. t. Ll. ts that the function 


Flay +++ 0) = TE fle, +++ ye) = 2 


be identically zero. 


Proor. Since f is continuous, we can prove the necessity of the condition by 
deriving a contradiction from the assumption that f is a d. f. u. t. 1. for which 
there exist distinct numbers a,,--- , a, such that f(a1,---,@n) = A #¥ ai, 
@=1,---,7n). 

Since the numbers a, ---,a,, A are distinct, there will exist a positive 
number ¢e such that the (n + 1) intervals 


[: A-ex<rxrecAte 
qi: aj-ex<u<sate (¢ = 1,--+,#), 


have no points in common. Moreover, since f is continuous, there will cor- 
respond to € a positive number e, < ¢ such that 


A—e<f(t,-+:,%n) SAE 
provided that simultaneously 
la — ai| < «@ (i = 1,-++,m). 


Now let p be any number between } and 3. Corresponding to p we define 
the function o,(2) as follows. In the interval J we set o,(x) = p. In every 
interval 


de a4—-asx2a2< at aq (@@ = 1, ---,n) 


1Cf. S. S. Wilks, Mathematical Statistics, Princeton University Press (1943), pp. 93-94. 
214 





TOLERANCE LIMITS 215 


we let o,(x) increase an amount (+) . Outside the intervals IJ, Ji, ---, Ja 


we define o,(x) in any manner so that it is continuous, differentiable, and non- 
decreasing for every x, and has the properties ¢,(— ~~) = 0,0,(#) = 1. Itis 
clear that we can do this. 


Let S denote the set of all points (1, --- , zn) of n dimensional space such 
that simultaneously 


la; —a|<a (i = 1,-+-,n). 
Then by construction; for ¢, (x) defined above, 


Pr((Xi1, °*-, Xn) €S) = (3). 


But if (Xi, --- , X,) € S, then by construction, 
A-—eSfi,:--,%,) sAte 
and 
Y = o,(f(X1, +--+, Xn)) = D. 
Hence for o(a) = o»(x) we have 
Pry =p) > (2). 
3n 
But since f is a d. f. u. t. 1., this inequality must hold for any a (2). 
Now choose a set of numbers 
3 <P <p2< ++: < pm < 3, 
where m = 2(3n)". Then from the above, 
Pr(Y = one of the numbers p;, --- , Pm) > 2. 


This is the desired contradiction. 

Let O,(a1 , +++ , 2x) be the function whose value is the rth term when the num- 
bers 41, °-* , % are arranged in non-decreasing order of magnitude. In terms 
of the functions O, we can characterize the continuous functions f which satisfy 
the identity f = 0 as follows. Let i, --- , 7, be a permutation of the integers 
l,---,n. Denote by F(t, , --- , 7) the set of all points (a , --- , 2.) such that 


Zu MS Big SM °° SB. 


The n! sets E are open and disjoint. Since f is continuous and f = 0, in each 
E(i,, --+ , tn) we must have, for some r, 


f(ti,-°>, In) = Olt, --+ , Zn); 


where the integer r = r(z,;, --- , 7.) must depend on the permutation 7; , *o* » Se 
in such a way that f may be extended continuously over the whole space. (The 





216 HERBERT ROBBINS 


condition for this is as follows. Two permutations 7%, --- , 7, and ji, ---, j, 
may be called adjacent if they differ only by an interchange of two adjacent 
integers. Then for any two adjacent permutations, either r(z;,---,7,) = 
r(j1,°** jn) or the two values of r are the two interchanged integers. For 
example, the function 


= O3(a1 > M2, 2X3) if O3(x1 > M2, x3) = 7% 
S(t, Tt) = en ,%2,2%3) otherwise 
satisfies this requirement.) 

We shall now prove that the necessary condition, f = 0, of Theorem 1 is suf. 
ficient to ensure that the continuous function f be a d. f. u. t. 1. Fromthe 
argument of the preceding paragraph, any continuous function f such that 
f = 0 will in each set E(i;, --- , 7.) have the value O,(2, --- , 2n), where ris 
an integer from 1 ton. Since the variables X,, --- , X, are independent and 
have the same probability distribution, the probability that (Xi, --- , X,) will 
belong to E(t:, --- , 7%.) is equal to (1/n!) for every permutation 7, --- ,i,. 
Let 


W = f(X, eee, Xa). 


Then if g(x) = do(x)/dx denotes the probability density function of each X;, 
the conditional p. d. f. of W = O,(X1, --- , Xn), given that (Xi, ---, X,) 
belongs to E(t, --- ,7%,), will be n!y,(w), where 


e(w)a” (wht — o(w)l" 


¥-(w) = (rf — I!(n— rn)! 


Thus y,(w) will be of the form 
¥-(w) = 9(w)F,(o(w)), 


where F,(¢(w)) is a polynomial in o(w). Hence the conditional p. d. f. of Y = 
o(W), given that (X,,--- , X,) belongs to E(i, --- , in), will be n!é,(y), where 


&-(y) = F,(y), 


and the p. d. f. of Y will be 
Ey) = 2F,(y), 


where the summation is over the n! integers r = r(i;, ---,%n). This is inde 
pendent of ¢(x), so that fisad.f.u.t.1. This completes the proof of Theorem 1. 

A function f(a: , --- , 2n) is symmetric if its value is unchanged by any permu- 
tation of its arguments. It is clear that the only continuous and symmetric 
functions f which satisfy the identity f = 0 are the n functions O,(a1, +++ , 4x): 
Hence we can state 

THEoREM2. The only symmetric d. f. wu. t.l.’s are the n functions O,(a1 , +++ 5 tn) 
(r= 1, ---, 1m). 





SAMPLE SIZES 


A FORMULA FOR SAMPLE SIZES FOR POPULATION 
TOLERANCE LIMITS 


By H. Scuerré ANp J. W. TuKEY 
Princeton University 


In a paper to appear in a later issue of this journal dealing with various results 
on non-parametric estimation, we shall discuss in detail an approximate formula 
for the numerical calculation of sample sizes for Wilks’ population tolerance 
limits. Because of the practical usefulness of this formula, it seems desirable 
to make it available without delay. Its accuracy is adequate for all direct ap- 
plications. 

An interval J is said to cover a proportion 7 of a univariate population with 
cumulative distribution function F(z) if frdF = mw. Let Xi, X2,---,X, 
be a random sample from the population, and Z; < Z: < --- < Z, be a rear- 
rangement of X1, X2,---,Xn. Define Z) = — ©, Zaz: = +, and consider 
the proportion B of the population covered by the random interval (Zi , Zn—m4+1). 
Then Pr{B > b} is independent of F(x) if F(x) is continuous’, and equals 
1-—I(n — r+ 1,7), where r = k + m and I,(p, q) is K. Pearson’s notation 
for the incomplete Beta function. 

Choose a confidence coefficient 1 — a, a pair of positive integers k, m, and a 
fraction b. The sample size n for which we can make the statement “the prob- 
ability is 1 — a@ that the random interval (Zi, Zn—m+41) cover a proportion b 
or more of the population” is then determined by the equation 


(1.1) h(n -r+1,1r) =a, 
where r = k + m. Our approximate solution is 
(1.2) n = kxa(1 + b)/(1 — b) + 3" — 1), 


where x. is the 100a percent point on the x’-distribution with 2r degrees of 
freedom. ‘The required values of x3 may be found for a = .1, .05, .025, .01, 
005 in Catherine Thompson’s table [2]. For this range of a, and for b > 29, 
extensive numerical calculations indicate that the error of (1.2) is less than one 
tenth of one percent, and is always positive, that is, n is slightly overestimated 
by (1.2). We have not yet obtained an analytic proof of this statement, which 
refers to the difference from the exact (and, in general, non-integral) solution of 
(1.1). 

As explained elsewhere [1], formula (1.2) may be used for Wald’s solution 
of the multivariate case. 


REFERENCES 


(1] H. Scuerré, Annals of Math. Stat., Vol. 14 (1948), p. 324. 
[2] C. M. Tuomrson, Biometrika, Vol. 32 (1941), p. 189. 


1That the theory is valid in this case we show later. Previous proofs have required 
the continuity of F’(z). 







































































































































S- £4964 Be Sy SS ™ Ff 


T. N. E. GREVILLE 


A GENERALIZATION OF WARING’S FORMULA 
By T. N. E. GReEvILLE 


Bureau of the Census 





Waring’s formula (frequently, but less correctly, called Lagrange’s formula) 
gives the polynomial of degree n taking on specified values for n + 1 distinet 
arguments. It is frequently used for interpolation purposes in dealing with 
functions for which numerical values are given at unequal intervals. ~This 
formula may be written in the form: 


n os 
(1) fiz) =D | Ha.) Il — “t. 

i=0 i(4i)=0 Aj — A; 
where a, @, @2,-°-- , @ are the arguments for which the value of the poly- 
nomial f(x) is given. This formula was first published by Waring [2] in 1779, 
and it was not until 1795 that Lagrange gave it in his book: Lecons Elémentaires 
sur les Mathématiques. The prominent British actuary and mathematician, 
Mr. D. C. Fraser states that ‘“‘there are identities of notation in the statement of 
the formula which leave little doubt that Lagrange was simply quoting from 
Waring’s paper.”’ Waring’s priority was brought to my attention by Mr. Fraser 
and by Dr. W. Edwards Deming. 

If any two or more of the arguments a; are equal, the form (1) becomes inde- 
terminate. However, the limiting value, as m + 1 specified arguments approach 
a common value a, can be shown to be an expression involving the first m deriva- 
tives of the polynomial f(x) for the argument a. This case of “‘repeated argu- 
ments” is of considerable interest, especially in connection with the theory of 
osculatory, or smooth-junction interpolation [1, p. 33]. It is the purpose of this 
note to generalize the formula (1) to the case in which not only the value of 
f(x) but also of its first m; derivatives are given for each argument a;. The 


n 
degree of the polynomial represented, which we shall denote by N, is n + > m;. 
i=0 


The generalized formula is: 


n 


n ns ; mj+1 
(2) f(x) = 7 | Pace — a) II (: “) | 
i=0 i(4i)=0 \Ai — G; 

where P;(x — a;) denotes a polynomial in x — a; obtained by the following pro- 
cedure. First, f(x) is expanded in a Taylor series in powers of x — a;. Next, 
the expression (1 + i. 

ai — a 
different from 7. Finally, all the n + 1 expansions (n binomial and one Taylor) 
are multiplied together, and all terms containing powers of x — a; higher than 
m; are rejected. This formula has already been given by Steffensen [1, p. 33] 
for the particular case in which every m; = 1. 
The general formula (2) is difficult to arrive at without a previous knowledge 





—m;—1 
) is expanded as a binomial series for every j 


VARIANCE AND BEST ESTIMATES 219 


of the result, but is easily shown to be the correct expression. Upon differentiat- 
ing k times (0 S k S m,) all the terms in the summation except the one cor- 
responding to 7 = r will contain the factor («7 — a,)”* *** and will therefore 
vanish for x = a,. Moreover, the non-vanishing term, before differentiation, 
will agree, up to and including terms containing (x — a,)”’, with the Taylor 
expansion of f(x) in powers of x — a,, since the product expression within the 
brackets will be exactly canceled, as far as terms of degree m, , by the n binomial 
expansions. Hence the kth derivative of the non-vanishing term in the summa- 
tion will be f“’(a,) for x = a,. This establishes the formula. 

This formula is clearly equivalent to the Newton divided difference interpola- 
tion formula with repeated arguments [1, p. 33], the argument a; occurring 
m; + 1 times. Therefore, if f(z) is any function other than a polynomial of 
degree N or less, it is necessary to add a remainder term [1, pp. 22-23] of the form 


fu(e) IT (@e - a, 


where fy(x) denotes the limiting value [1, pp. 20-21] of the divided difference of 
order N involving the arguments z, ad, a@,°-°*, @n, With each argument a; 
appearing m; + 1 times. The existence of all the indicated derivatives is, of 
course, essential. 


REFERENCES 


[1] J. F. Steffensen, Interpolation, Baltimore, 1927. 
(2) E. Waring, “Problems concerning interpolation,’ Phil. Trans. Royal Soc., Vol. 69 
(1779), pp. 59-67. 


NOTE ON THE VARIANCE AND BEST ESTIMATES 


By H. G. Lanpau 
Washington, D.C. 


The purpose of this note is to point out a certain relation between the vari- 
ances, ¢; and o>, of the random variables, x; and x2, and the probabilities, 


P,(t) Pr{|a. — E(m)| < 
P(t) = Pr{ | x2 ~~ E (22) | < ti. 


This is, if 5 < 02, then P,(t) > P2(t) in at least one interval, th < t < t. 

A note by A. T. Craig [1] gave an example for which it was stated that oi < o3 
and P,(t) < P(t) for every t; but, as was pointed by Neyman [2], calculation of 
the probabilities involved shows the statement to be incorrect. 

The present result provides a certain justification for the use of minimum 
variance estimates by assuring that no other estimate with the same mean can 
have, for every value of t, a greater probability of a deviation from the mean 








220 H. G. LANDAU 





less than ¢. If an estimate can be found which has a greater value of P(t) for 
all ¢ than does any other estimate, it is necessarily the minimum variance 
estimate. 

The theorem below includes a similar relation for equal variances. This 
theorem can be obtained from known general results on inequalities for distri- 
butions determined by moments, [3] and [4]. The formulation given here with 
its significance for estimates does not appear to have been remarked. 


Tureorem. If the random variables, x; and x2, have finite variances, o; and 


o2 , and 


9 
» 
9 


oi So 


nw te 


, 
then, either 


Q(t) = Pi® — P2(d), 


is equal to zero at all points of continuity, which can occur only for oj = o2 , or there 
as an interval, ty < t < t., in which Q(t) is positive. 
Proor. We write the variance as the Stieltjes integral, 


= t dP,(2), 


and similarly for o3 . 
Let 


S(T) I dP) — I  £ dPalt) = I * # aQ(t) 


= T’°Q(T) —2 I (Q(t) dt, 


integrating by parts. 
Now 


T*(1 — P,(T)| = rf dP,(t) < [ t dP,(t), 


and since oj is finite, [ f dP,(t) > 0 as T > ~, so that lim 7"[1 — P,(T)| 
e T-2 
= 0, and similarly for P2(¢). 
Hence 7° Q(T) = T'[1 — P.(T)] — T’ll — Pi(T)] > 0 as T > «, and since 
by definition lim S(7’) = oi — o3 it follows that 


T—2 


si of = -2] 1Q( dt. 
0 
From this it can be seen that either, Q(t) vanishes at all points of continuity, 
in which case ¢; = o2, or Q(t) must be positive in some interval, since other- 
wise [ tQ(t) dt must be negative and hence gi — o2 > O contrary to the assump- 
0 


: 2 2 
tion, gj < o2. 






VARIANCE AND BEST ESTIMATES 


REFERENCES 


fi] A. T. Crate, “‘A note on the best linear estimate,”’ Annals of Math. Stat., Vol. 14 (1943), 
pp. 88-90. 

2] J. Nerman, Math. Reviews, Vol. 4 (1943), p. 280. 

[3] J. V. Uspensky, Introduction to Mathematical Probability, New York, McGraw-Hill 
(1937), pp. 373-380. 

(4) A. Waxp, “‘Limits of a distribution function determined by absolute moments and 
inequalities satisfied by absolute moments,’’ Trans. Amer. Math. Soc., Vol. 46 
(1939), pp. 280-306. 









































NEWS AND NOTICES . 


Bo 
Readers are invited to submit to the Secretary of the Institute news items of general interes, Br 
Personal Items . 

Dr. R. L. Anderson, on leave from the North Carolina State College, js 
serving as a research mathematician on a war research project at Princeton . 
University. ; Ce 

Professor Kenneth J. Arnold, on leave from the University of New Hampshire, Ce 
is doing war research in the Statistical Research Group, at Columbia University, 

Dr. W. J. Dixon, on leave from the University of Oklahoma, is serving as q D 
research mathematician on a war research project at Princeton University. E 

Mr. R. M. Foster of Bell Telephone Laboratories has been appointed professor 
and head of the department of mathematics of the Polytechnic Institute of F 
Brooklyn. 

Dr. Hilda Geiringer, Lecturer at Bryn Mawr College, has been appointed F 
Professor of Mathematics and Head of the Mathematics Department at Wheaton G 
College, Norton, Massachusetts. 

Assistant Professor E. H. C. Hildebrandt of the State Teachers College, G 
Upper Montclair, New Jersey, has been appointed to an assistant professorship 
at Northwestern University. C 

Mr. John F. Kenney of the University of Wisconsin has been promoted to an ¢ 
assistant professorship. 

Assistant Professor L. A. Knowler of the University of lowa has been promoted I 
to an associate professorship. 

‘ Dr. Saul B. Sells is now Head Statistician of the Statistical Standards Branch, I 
i Office of Price Administration, Washington. I 

Mr. Walter H. Thompson, formerly an instructor at Virginia Polytechnic 
Institute, is now associated with the Kellex Corporation, New York. ] 

Professor Helen M. Walker of Teachers College, Columbia University, has 
been elected president of the American Statistical Association. 

‘~ Professor C. C. Wagner of Pennsylvania State College has been appointed 
assistant dean of the School of Liberal Arts. 

Dr. Edward Helly of the Illinois Institute of Technology died November 28, : 


1943. 


New Members 


The following persons have been elected to membership in the Institute: 








Andrews, T. Gaylord. Ph.D. (Nebraska) Instructor in Psychology, Barnard College, 
Columbia University, New York City. 

Angell, Dorothy T. Member of Technical Staff, Bell Telephone Laboratories. Murray Hill, 
New Jersey. 

Barnes, John L. Member of Technical Staff, Bell Telephone Laboratories. 27 North 
Cherry Lane, Rumson, New Jersey. 


222 


NEWS AND NOTICES 223 


Bloom, Rose. B.A. (Hunter) Pvt., Billings General Hospital, Ft. Benj. Harrison, Indiana. 

Bonnar, Robert Underwood. M.S. (Univ. of Washington) Associate Chemist, Bureau of 
Ships, Navy Dept. 414 Whitestone Road, Silverspring, Maryland. 

Brearty,C.R. B.S. (California) Officer in Charge, Quality Control Section, Signal Corps 
Inspection Agency. 19 West 4th St., Dayton 2, Ohio. 

Campbell, George Clyde. M.S. (Iowa) Captain CWS Chief, Fiscal Div., Pine Bluff 
Arsenal. Troy Road, RFD #1, Boonton, New Jersey. 

Clifford, Paul C. A.M. (Columbia) Asst. Prof. of Math., State Teachers CoHege, Mont- 
clair, New Jersey. 541 Upper Mountain Ave. 

Cobb, William J. Statistician, Census Bureau. 4036 8th St., NE, Washington, D. C. 

Coggins, Paul Pond. M.A. (Harvard) Accountant, American Tel. and Tel. 195 Broad- 
way, New York, N.Y. 

Dietzold, RobertL. Ph.B. (Yale) M.T.S., Bell Telephone Lab. 34W. 11th St., New York, 
M2. 

Elconin, Victor. M.S. (California Inst. of Technology) Associate Physicist, California 
Inst. of Technology. 740 Cordova Ave., Glendale 6, California. 

Ferrell, Enoch B. M.A. (Oklahoma) Member of Technical Staff, Bell Telephone Labora- 
tories. 75 Fuller Ave., Chatham, New Jersey. 

Ferris, Charles Duncan. A.B. (Princeton) Engineering Statistician, Surveillance Branch, 
T/3 Army. Ballistic Research Laboratory, Aberdeen Proving Ground, Md. 

Goldberg, Henry. M.A. (Columbia) Asst. Mathematical Statistician, Columbia Uni- 
versity, 401 West 118th Street, New York 27, N.Y. 

Gordon, Donald A. A.M. (Columbia) Assistant, Columbia University. 1327 HE. 26 
Street, Brooklyn, New York. 

Greenleaf, Herrick E.H. Ph.D. (Indiana) Prof.of Mathematics. 10248.College Avenue, 
Greencastle, Indiana. 

Griffitts, C. H. Ph.D. (Michigan) Professor of Psychology. 1507 Charlton Ave., Ann 
Arbor, Michigan. 

Hadley, Clausin D. Ph.D. (Wisconsin) Statistician, Marketing Research Dept., Eli Lilly 
and Company, Indianapolis 6, Indiana. 

Halbert, K. W. A.M. (Harvard) Statistician, Amer. Tel. and Tel. Co., 195 Broadway, 
-New York, N. Y. 

Hall, Marguerite F. Ph.D. (Michigan) Asst. Prof. Public Health Statistics. 25 Ridge- 
way, Ann Arbor, Michigan. 

Halmos, Paul Richard. Ph.D. (Illinois) Asst. Professor of Mathematics, Syracuse Uni- 
versity. 513 Fellows Avenue, Syracuse 10, N.Y. 

Harold, Miriam S. B.A. (Hunter) Member of Technical Staff, Bell Telephone Labora- 
tories. 19 Hillside Avenue, Chatham, New Jersey. 

Hatke, Sister M. Agnes. M.S. (Indiana State Teachers College) Graduate student at 
Purdue University. St. Francis College, Lafayette, Indiana. 

Hizon, Manuel O. M.A. (Michigan) Philippine Govt. Scholar at Univ. of Michigan. 
816 Packard, Ann Arbor, Michigan. 

Hodgkinson, William, Jr. B.A. (Harvard) Major, HQ AAF,U.S.Army. 195 Broadway, 
New York, N. Y. 

Jacob, Walter C. Ph.D. (Cornell) Lt., U. S. Navy. Bureau of Ships, Navy Dept., 
Washington, D. C. 

Jones, Howard L. C.P.A. (Illinois) Supervisor of Revenue Results, Illinois Bell Tele- 
phone Co. Room 1100, 309 W. Washington St., Chicago 6, Ill. 

Kaitz, Hyman B. A.B. (Geoaye Washington Univ.) Statistical Consultant, Psychological 
Research Unit No. 11, B.A.A.F., Fort Myers, Fla. 

Keyfitz, Nathan. B.Sc. (McGill) Statistician, Dominion Bureau of Statistics, Ottawa, 
Canada. Billings Bridge, Ontario. 

Kirchen, Calvin J. M.A. (Wisconsin) Product Control Statistician, U. 8S. Rubber Co. 
4028 11th St. Place, Des Moines 13, Towa. 





224 NEWS AND NOTICES 


LaSala, Lucy Anne. B.A. (Hunter) Applied Math Group, Columbia Univ. 256 Irving 
Avenue, Brooklyn 27, New York. 

Leone, Fred Charles. M.S. (Georgetown Univ.) Instructor in Math., Purdue Uniy, 
310 N. Salisbury St., W. Lafayette, Indiana. 

McNamara, Kathryn J. M.A. (Clark Univ.) Economist, Business Research Dept., H. J, 
Heinz Company, P.O. Box 57, Pittsburgh 30, Pa. 

McPherson, John Cloud. B.S. (Princeton) Director of Engineering, Int’! Business 
Machines Corp. 690 Madison Ave., New York 22, N. Y. 

Millikan, Max F. Ph.D. (Yale) Asst. Director, Div. of Ship Requits., War Shipping 
Admin. 2313 Huidekoper Pl., NW, Washington 7, D.C. : 

Morrow, Dorothy Jeanne. M.S. (Univ.of Washington) Fellow in Mathematical Statistics, 
Columbia University. 605 W. 115 St., New York 25, N. Y. 

Morse, John W. M.A. (Columbia) Ordnance Engineer, Quality Control Unit Inspection 
Sec., Ammunition Branch, War Dept. 11 Verne St., Bethesda, Md. 

Mottley, Charles McCammon. Ph.D. (Toronto) Lieut., USNR, Bureau of Ships, (338), 
Navy Dept., Washington, D. C. 

Murphy, Ray B. B.A. (Princeton) 2nd Lt., U.S. Marine Corps Reserve. 28 Godfrey Rd., 
Upper Montclar, N. J. 

Myslivec, Vaclav. Ph.D. (Prague) Czechoslovak delegate to the United Nations Interim 
Commission on Food and Agriculture. Room 606, 1775 Broadway, New York 19, N.Y. 

Nicholson, George Edward, Jr. M.A. (Univ. of North Carolina) Asst. Mathematician, 
Applied Mathematics Group, Columbia Univ. 176 Park St., Montclair, N. J. 

Osterman, Herbert William. B.S. (Michigan) c/o B. E. Wyatt, 1029 Vermond Ave. NW, 
Washington, D.C. 

Parke, Nathan Grier, III. A.B. (Princeton) Senior Aviation Design Research Engr., 
Navy Dept. Malvern Ave., Ruzxton 4, Md. 

Priestley, Alice E. A. M.A. (New York Univ.) Instructor in Math, Lafayette College. 
226 McCartney St., Easton, Pa. 

Rapkin, Chester. M.A. (American University) Associate Statistical Analyst, Deputy 
Director, Division of Operating Statistics, Federal Home Loan Bank Admin., 2 Park 
Avenue, New York 16, N.Y. 

Rivoli, Bianca. M.A. (Columbia University) Statistician, Judson Health Center. 3265 
Bainbridge Avenue, New York 67, N. Y. 

Roshal, Sol M. B.S. (Chicago) Officer in charge of research statistics, PRU #1, Nashville 
Army Air Center, Nashville, Tenn. 

Ross, Frank A. Ph.D. (Columbia) Editor, Journal of the American Statistical Assn. 
Thetford, Vermont. 

Schaeffer, Esther. A.B. (Chicago) Technical Asst., Univ. of Michigan. 547 Elm Street, 
Ann Arbor, Mich. 

Schilling, Walter. M.D. (Harvard) Asst. Clinical Professor of Medicine, Stanford Uni- 
versity Hospital, San Francisco 15, California. 

Shannon, Claude E. Ph.D. (M.I.T.) Member of Technical Staff, Bell Telephone Labora- 
tories, 463 West Street, New York, N. Y. 

Sherman, Jack. Ph.D. (Calif. Inst. of Technology) Research Chemist, The Texas Co. 
170 Church St., Poughkeepsie, New York. 

Smallwood, Hugh M. Ph.D. (Johns Hopkins) Asst. Dept. Head, Physical Research 
Dept., U. S. Rubber Co., Market & South Streets, Passaic, N. J. 

Smith,. Joan Thiede. B.S. (Minnesota) Accountant. 673 East Nebraska Ave., St. Paul, 
Minn. . 

Smith, R. Tynes, III. Chief, Transport Economics Branch, Traffic Control Div., Office 
Chief of Transportation, A.S.F. 1001 16th St. S., Arlington, Va. 

Sobezyk, Andrew. Ph.D. (Princeton) Staff Member, Radiation Lab., Mass. Inst. of Tech- 
nology. 82 Bow Street, Lexington 72, Mass. 





NEWS AND NOTICES 225 


Thurstone, LouisLeon. Ph.D. (Chicago) Professor of Psychology, University of Chicago, 
Chicago, Illinois. 

Toralballa, Leopoldo V. Ph.D. (Michigan) Special Instructor, University of Michigan. 
1109 Willard St., Ann Arbor, Mich. 

Walsh, John E. B.S. (Notre Dame) Mathematician, Lockheed Aircraft Corp. 707 East 
Elk Ave., Glendale 5, Calif. 

Weber, Bruce Travis. M.A. (Columbia) Member of Technical Staff, Bell Telephone 
Laboratories, Marray Hill, New Jersey. 

Weiner, Louis. A.M. (Harvard) Economist, U. S. Bureau of Labor Statistics. 4915 
Russell Ave., Mt. Rainier, Maryland. 

Woodward, Patricia. Ph.D. (Pennsylvania) Associate Executive Secretary, Committee 
on Food Habits, National Research Council. 2101 Constitution Ave., Washington 25, 
D.C. 





S 2398-4 2+ Os He NL 
- $4 


REPORT ON THE WASHINGTON MEETING OF THE INSTITUTE 


The second regional spring meeting of the Institute of Mathematical Statisties 
was held at George Washington University, Washington, D. C., Saturday and 
Sunday, May 6 and 7, 1944, jointly with a regional meeting of the American 
Statistical Association. The 383 registrants at the joint meeting included the 
following 76 members of the Institute: 


Paul H. Anderson, R. L. Anderson, Theodore W. Anderson, Jr., Kenneth J. Arnold, 
Kenneth J. Arrow, Ist Lt., AC, Blair M. Bennett, Ernest S. Blanche, C. I. Bliss, Bonnar 
Brown, Joseph G. Bryan, Marjorie F. Buck, A. George Carlton, C. W. Churchman, William 
J. Cobb, Major A. C. Cohen, Jr., Edwin L. Crosby, Haskell B. Curry, J. H. Curtiss, B. B. 
Day, Seott Dayton, W. Edwards Deming, Philip Desind, W. J. Dixon, J. L. Doob, Will 
Feller, William C. Flaherty, Thomas N. E. Greville, E. J. Gumbel, Trygve Haavelmo, 
Margaret Jarman Hagood, Morris H. Hansen, Elvin A. Hoy, Leonid Hurwicz, Lt. (jg) W. C. 
Jacob, Alice S. Kaitz, Evelyn M. Kennedy, Lila F. Knudsen, H. 8. Konijn, T. Koopmans, 
Anita R. Kury, Jacob E. Lieberman, Philip J. McCarthy, Francis McIntyre, William G. 
Madow, Lt. C. J. Maloney, Sophie Marcuse, John W. Mauchly, A. M. Mood, Vladimir A. 
Nekrassoff, Monroe L. Norden, H. W. Norton, Victor Perlo, A. C. Rosander, William Salkind, 
Marion M. Sandomire, Max Sasuly, Franklin E. Satterthwaite, 8. B. Sells, L. W. Shaw, 
W. Arthur Shelton, Walter A. Shewhart, Harry Shulman, Blanche Skalak, John H. Smith, 
R. T. Smith, Arthur Stein, Joseph Steinberg, J. W. Tukey, Joseph L. Ullman, David F. 
Votaw, Jr., Capt. A. N. Watson, Frank M. Weida, Louis Weiner, 8S. 8. Wilks, Patricia Wood- 
ward, Bertram Yood. 


All sessions were held jointly with the American Statistical Association. 
Professor Will Feller of Brown University acted as Chairman for the Saturday 
afternoon session. The following papers were presented: 


1. Elements of the Theory of Testing Hypotheses. 
J. H. Curtiss, Jr., Navy Department 

2. Large Sample Tests of Statistical Hypotheses. 
Abraham Wald, Columbia University 


Professor Frank Weida of George Washington University acted as Chairman 
for the Sunday morning session. The following contributed papers were 
presented : 


1. On the Statistics of Sensitivity Data. 

C. West Churchman and Benjamin Epstein, Frankford Arsenal and the University of 
Pennsylvania 

. Simplified Plotting of Statistical Observations. 
E. J. Gumbel, New School of Social Research 

. Distribution of Sample Variances and Covariances of Normally Distributed Noncentral 
Variables. 
M. A. Girshick, Department of Agriculture 

. An Application of the Variate Difference Method to Multiple Regression. 
Gerhard Tintner, Department of Agriculture 

. Autocorrelation in London Temperature. 
Horace W. Norton, Department of Commerce 


226 





WASHINGTON MEETING 


Professor Samuel S. Wilks of Princeton University acted as Chairman for the 
Sunday afternoon session. The following papers were presented: 


1. Regression Problems in Time Series. 

Tjalling Koopmans, Combined Shipping Adjustment Board 
2. Foundations of the Theory of Time Series. 

J. L. Doob, Navy Department 


The business meeting of the Washington Chapter of the Institute was held 
Sunday morning. The proposed Constitution of the Washington Chapter was 
ratified, and elections under this constitution were held for the first time. The 
following officers were elected: 


William G. Madow, Census Bureau—3 year term, Secretary of the Program Committee 
1944-45 


Solomon Kullback, War Department—2 year term 
Frank Weida, George Washington University—1 year term. 


W. G. Mavow 
Secretary, Program Committee 
Washington Chapter 











. 


¢ 

oT 

] 
. 
% 
FE a 

i 

* 

G 

; 

aed 


